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Novel Proteins and Nucleic Acids Encoding Same 

Field of the Invention 

The present invention relates to novel polypeptides that are targets of small molecule 
drugs and that have properties related to stimulation of biochemical or physiological responses 
in a cell, a tissue, an organ or an organism. More particularly, the novel polypeptides are gene 
products of novel genes, or are specified biologically active fragments or derivatives thereof. 
Methods of use encompass diagnostic and prognostic assay procedures as well as methods of 
treating diverse pathological conditions. The present invention discloses novel associations of 
proteins and polypeptides and the nucleic acids that encode them with various diseases or 
pathologies. The proteins and related proteins that are similar to them, are encoded by a 
cDNA and/or by genomic DNA. The proteins, polypeptides and their cognate nucleic acids 
were identified by Curagen Corporation in certain cases. The XYZase-encoded protein and 
any variants, thereof, are suitable as diagnostic markers, targets for an antibody therapeutic 
and targets for small molecule drugs. As such the current invention embodies the use of 
recombinantly expressed and/or endogenously expressed protein in various screens to identify 
such therapeutic antibodies and/or therapeutic small molecules. 
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Background 

Eukaryotic cells are characterized by biochemical and physiological processes which 
under normal conditions are exquisitely balanced to achieve the preservation and propagation 
of the cells. When such cells are components of multicellular organisms such as vertebrates, 
or more particularly organisms such as mammals, the regulation of the biochemical and 
physiological processes involves intricate signaling pathways. Frequently, such signaling 
pathways are constituted of extracellular signaling proteins, cellular receptors that bind the 
signaling proteins and signal transducing components located within the cells. 

Signaling proteins may be classified as endocrine effectors, paracrine effectors or 
autocrine effectors. Endocrine effectors are signaling molecules secreted by a given organ 
into the circulatory system, which are then transported to a distant target organ or tissue. The 
target cells include the receptors for the endocrine effector, and when the endocrine effector 
binds, a signaling cascade is induced. Paracrine effectors involve secreting cells and receptor 
cells in close proximity to each other, for example two different classes of cells in the same 
tissue or organ. One class of cells secretes the paracrine effector, which then reaches the 
second class of cells, for example by diffusion through the extracellular fluid. The second 
class of cells contains the receptors for the paracrine effector; binding of the effector results in 
induction of the signaling cascade that elicits the corresponding biochemical or physiological 
effect. Autocrine effectors are highly analogous to paracrine effectors, except that the same 
cell type that secretes the autocrine effector also contains the receptor. Thus the autocrine 
effector binds to receptors on the same cell, or on identical neighboring cells. The binding 
process then elicits the characteristic biochemical or physiological effect. 

Signaling processes may elicit a variety of effects on cells and tissues including by way 
of nonlimiting example induction of cell or tissue proliferation, suppression of growth or 
proliferation, induction of differentiation or maturation of a cell or tissue, and suppression of 
differentiation or maturation of a cell or tissue. 

Many pathological conditions involve dysregulation of expression of important 
effector proteins. In certain classes of pathologies the dysregulation is manifested as 
diminished or suppressed level of synthesis and secretion protein effectors. In a clinical 
setting a subject may be suspected of suffering from a condition brought on by diminished or 
suppressed levels of a protein effector of interest. Therefore there is a need to be able to assay 
for the level of the protein effector of interest in a biological sample from such a subject, and 
to compare the level with that characteristic of a nonpathological condition. There further is a 
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need to provide the protein effector as a product of manufacture. Administration of the 
effector to a subject in need thereof is useful in treatment of the pathological condition, or the 
protein effector deficiency or suppression may be favorably acted upon by the administration 
of another small molecule drug product Accordingly, there is a need for a method of 
treatment of a pathological condition brought on by a diminished or suppressed levels of the 
protein effector of interest. 

Small molecule targets have been implicated in various disease states or pathologies. 
These targets may be proteins, and particularly enzymatic proteins, which are acted upon by 
small molecule drugs for the purpose of altering target function and achieving a desired result. 
Cellular, animal and clinical studies can be performed to elucidate the genetic contribution to 
the etiology and pathogenesis of conditions in which small molecule targets are implicated in a 
variety of physiologic, pharmacologic or native states. These studies utilize the core 
technologies at CuraGen Corporation to look at differential gene expression, protein-protein 
interactions, large-scale sequencing of expressed genes and the association of genetic 
variations such as, but not limited to, single nucleotide polymorphisms (SNPs) or splice 
variants in and between biological samples from experimental and control groups. The goal of 
such studies is to identify potential avenues for therapeutic intervention in order to prevent, 
treat the consequences or cure the conditions. 

In order to treat diseases, pathologies and other abnormal states or conditions in which 
a mammalian organism has been diagnosed as being, or as being at risk for becoming, other 
than in a normal state or condition, it is important to identify new therapeutic agents. Such a 
procedure includes at least the steps of identifying a target component within an affected tissue 
or organ, and identifying a candidate therapeutic agent that modulates the functional attributes 
of the target. The target component may be any biological macromolecule implicated in the 
disease or pathology. Commonly the target is a polypeptide or protein with specific functional 
attributes. Other classes of macromolecule may be a nucleic acid, a polysaccharide, a lipid 
such as a complex lipid or a glycolipid; in addition a target may be a sub-cellular structure or 
extra-cellular structure that is comprised of more than one of these classes of macromolecule. 
Once such a target has been identified, it may be employed in a screening assay in order to 
identify favorable candidate therapeutic agents from among a large population of substances 
or compounds. 

In many cases the objective of such screening assays is to identify small molecule 
candidates; this is commonly approached by the use of combinatorial methodologies to 
develop the population of substances to be tested. The implementation of high throughput 
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screening methodologies is advantageous when working with large, combinatorial libraries of 
compounds. 

It is an objective of this invention to provide at least one target biopolymer that is 
intended to serve as the macromolecular component in a screening assay for identifying 
candidate pharmaceutical agents. 

It is another objective of the present invention to provide screening assays that 
positively identify candidate pharmaceutical agents from among a combinatorial library of low 
molecular weight substances or compounds. 

It is still a further objective of this invention to employ the candidate pharmaceutical 
agents in any of a variety of in vitro, ex vivo and in vivo assays in order to identify 
pharmaceutical agents with advantageous therapeutic applications in the treatment of a 
disease, pathology, or abnormal state or condition in a mammal. 



Summary Of The Invention 

The invention is based in part upon the discovery of nucleic acid sequences encoding 
novel polypeptides. These nucleic acids and polypeptides, as well as derivatives, homologs, 
analogs and fragments thereof, will hereinafter be collectively designated as "NOVX" nucleic 
acid, which represents the nucleotide sequence selected from the group consisting of SEQ ID 
NO: 2n-l, wherein n is an integer between 1 and 178, or polypeptide sequences, which 
represents the group consisting of SEQ ID NO: 2n, wherein n is an integer between 1 and 178. 

In one aspect, the invention provides an isolated polypeptide comprising a mature form 
of a NOVX amino acid. One example is a variant of a mature form of a NOVX amino acid 
sequence, wherein any amino acid in the mature form is changed to a different amino acid, 
provided that no more than 15% of the amino acid residues in the sequence of the mature form 
are so changed. The amino acid can be, for example, a NOVX amino acid sequence or a 
variant of a NOVX amino acid sequence, wherein any amino acid specified in the chosen 
sequence is changed to a different amino acid, provided that no more than 15% of the amino 
acid residues in the sequence are so changed. The invention also includes fragments of any of 
these. In another aspect, the invention also includes an isolated nucleic acid that encodes a 
NOVX polypeptide, or a fragment, homolog, analog or derivative thereof. 

Also included in the invention is a NOVX polypeptide that is a naturally occurring 
allelic variant of a NOVX sequence. In one embodiment, the allelic variant includes an amino 
acid sequence that is the translation of a nucleic acid sequence differing by a single nucleotide 
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from a NOVX nucleic acid sequence. In another embodiment, the NOVX polypeptide is a 
variant polypeptide described therein, wherein any amino acid specified in the chosen 
sequence is changed to provide a conservative substitution. In one embodiment, the invention 
discloses a method for determining the presence or amount of the NOVX polypeptide in a 
sample. The method involves the steps of: providing a sample; introducing the sample to an 
antibody that binds immunospecifically to the polypeptide; and determining the presence or 
amount of antibody bound to the NOVX polypeptide, thereby determining the presence or 
amount of the NOVX polypeptide in the sample. In another embodiment, the invention 
provides a method for determining the presence of or predisposition to a disease associated 
with altered levels of a NOVX polypeptide in a mammalian subject. This method involves the 
steps of: measuring the level of expression of the polypeptide in a sample from the first 
mammalian subject; and comparing the amount of the polypeptide in the sample of the first 
step to the amount of the polypeptide present in a control sample from a second mammalian 
subject known not to have, or not to be predisposed to, the disease, wherein an alteration in the 
expression level of the polypeptide in the first subject as compared to the control sample 
indicates the presence of or predisposition to the disease. 

In a further embodiment, the invention includes a method of identifying an agent that 
binds to a NOVX polypeptide. This method involves the steps of: introducing the polypeptide 
to the agent; and determining whether the agent binds to the polypeptide. In various 
embodiments, the agent is a cellular receptor or a downstream effector. 

In another aspect, the invention provides a method for identifying a potential 
therapeutic agent for use in treatment of a pathology, wherein the pathology is related to 
aberrant expression or aberrant physiological interactions of a NOVX polypeptide. The 
method involves the steps of: providing a cell expressing the NOVX polypeptide and having a 
property or function ascribable to the polypeptide; contacting the cell with a composition 
comprising a candidate substance; and determining whether the substance alters the property 
or function ascribable to the polypeptide; whereby, if an alteration observed in the presence of 
the substance is not observed when the cell is contacted with a composition devoid of the 
substance, the substance is identified as a potential therapeutic agent. In another aspect, the 
invention describes a method for screening for a modulator of activity or of latency or 
predisposition to a pathology associated with the NOVX polypeptide. This method involves 
the following steps: administering a test compound to a test animal at increased risk for a 
pathology associated with the NOVX polypeptide, wherein the test animal recombinantly 
expresses the NOVX polypeptide. This method involves the steps of measuring the activity of 
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the NOVX polypeptide in the test animal after administering the compound of step; and 
comparing the activity of the protein in the test animal with the activity of the NOVX 
polypeptide in a control animal not administered the polypeptide, wherein a change in the 
activity of the NOVX polypeptide in the test animal relative to the control animal indicates the 
test compound is a modulator of latency of, or predisposition to, a pathology associated with 
the NOVX polypeptide. In one embodiment, the test animal is a recombinant test animal that 
expresses a test protein transgene or expresses the transgene under the control of a promoter at 
an increased level relative to a wild-type test animal, and wherein the promoter is not the 
native gene promoter of the transgene. In another aspect, the invention includes a method for 
modulating the activity of the NOVX polypeptide, the method comprising introducing a cell 
sample expressing the NOVX polypeptide with a compound that binds to the polypeptide in an 
amount sufficient to modulate the activity of the polypeptide. 

The invention also includes an isolated nucleic acid that encodes a NOVX polypeptide, or 
a fragment, homolog, analog or derivative thereof. In a preferred embodiment, the nucleic 
acid molecule comprises the nucleotide sequence of a naturally occurring allelic nucleic acid 
variant. In another embodiment, the nucleic acid encodes a variant polypeptide, wherein the 
variant polypeptide has the polypeptide sequence of a naturally occurring polypeptide variant. 
In another embodiment, the nucleic acid molecule differs by a single nucleotide from a NOVX 
nucleic acid sequence. In one embodiment, the NOVX nucleic acid molecule hybridizes under 
stringent conditions to the nucleotide sequence selected from the group consisting of SEQ ID 
NO: 2n-l, wherein n is an integer between 1 and 178, or a complement of the nucleotide 
sequence. In another aspect, the invention provides a vector or a cell expressing a NOVX 
nucleotide sequence. 

In one embodiment, the invention discloses a method for modulating the activity of a 
NOVX polypeptide. The method includes the steps of: introducing a cell sample expressing 
the NOVX polypeptide with a compound that binds to the polypeptide in an amount sufficient 
to modulate the activity of the polypeptide. In another embodiment, the invention includes an 
isolated NOVX nucleic acid molecule comprising a nucleic acid sequence encoding a 
polypeptide comprising a NOVX amino acid sequence or a variant of a mature form of the 
NOVX amino acid sequence, wherein any amino acid in the mature form of the chosen 
sequence is changed to a different amino acid, provided that no more than 15% of the amino 
acid residues in the sequence of the mature form are so changed. In another embodiment, the 
invention includes an amino acid sequence that is a variant of the NOVX amino acid 
sequence, in which any amino acid specified in the chosen sequence is changed to a different 
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amino acid, provided that no more than 15% of the amino acid residues in the sequence are so 
changed. 

In one embodiment, the invention discloses a NOVX nucleic acid fragment encoding at 
least a portion of a NOVX polypeptide or any variant of the polypeptide, wherein any amino 
acid of the chosen sequence is changed to a different amino acid, provided that no more than 
10% of the amino acid residues in the sequence are so changed. In another embodiment, the 
invention includes the complement of any of the NOVX nucleic acid molecules or a naturally 
occurring allelic nucleic acid variant. In another embodiment, the invention discloses a 
NOVX nucleic acid molecule that encodes a variant polypeptide, wherein the variant 
polypeptide has the polypeptide sequence of a naturally occurring polypeptide variant. In 
another embodiment, the invention discloses a NOVX nucleic acid, wherein the nucleic acid 
molecule differs by a single nucleotide from a NOVX nucleic acid sequence. 

In another aspect, the invention includes a NOVX nucleic acid, wherein one or more 
nucleotides in the NOVX nucleotide sequence is changed to a different nucleotide provided 
that no more than 15% of the nucleotides are so changed. In one embodiment, the invention 
discloses a nucleic acid fragment of the NOVX nucleotide sequence and a nucleic acid 
fragment wherein one or more nucleotides in the NOVX nucleotide sequence is changed from 
that selected from the group consisting of the chosen sequence to a different nucleotide 
provided that no more than 15% of the nucleotides are so changed. In another embodiment, 
the invention includes a nucleic acid molecule wherein the nucleic acid molecule hybridizes 
under stringent conditions to a NOVX nucleotide sequence or a complement of the NOVX 
nucleotide sequence. In one embodiment, the invention includes a nucleic acid molecule, 
wherein the sequence is changed such that no more than 15% of the nucleotides in the coding 
sequence differ from the NOVX nucleotide sequence or a fragment thereof. 

In a further aspect, the invention includes a method for determining the presence or 
amount of the NOVX nucleic acid in a sample. The method involves the steps of: providing 
the sample; introducing the sample to a probe that binds to the nucleic acid molecule; and 
determining the presence or amount of the probe bound to the NOVX nucleic acid molecule, 
thereby determining the presence or amount of the NOVX nucleic acid molecule in the 
sample. In one embodiment, the presence or amount of the nucleic acid molecule is used as a 
marker for cell or tissue type. 

In another aspect, the invention discloses a method for determining the presence of or 
predisposition to a disease associated with altered levels of the NOVX nucleic acid molecule 
of in a first mammalian subject. The method involves the steps of: measuring the amount of 
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NOVX nucleic acid in a sample from the first mammalian subject; and comparing the amount 
of the nucleic acid in the sample of step (a) to the amount of NOVX nucleic acid present in a 
control sample from a second mammalian subject known not to have or not be predisposed to, 
the disease; wherein an alteration in the level of the nucleic acid in the first subject as 
compared to the control sample indicates the presence of or predisposition to the disease. 

Unless otherwise defined, all technical and scientific terms used herein have the same 
meaning as commonly understood by one of ordinary skill in the art to . which this invention 
belongs. Although methods and materials similar or equivalent to those described herein can 
be used in the practice or testing of the present invention, suitable methods and materials are 
described below. All publications, patent applications, patents, and other references 
mentioned herein are incorporated by reference in their entirety. In the case of conflict, the 
present specification, including definitions, will control. In addition, the materials, methods, 
and examples are illustrative only and not intended to be limiting. 

Other features and advantages of the invention will be apparent from the following 
detailed description and claims. 

Detailed Description Of The Invention 

The present invention provides novel nucleotides and polypeptides encoded thereby. 
Included in the invention are the novel nucleic acid sequences, their encoded polypeptides, 
antibodies, and other related compounds. The sequences are collectively referred to herein as 
"NOVX nucleic acids" or "NOVX polynucleotides" and the corresponding encoded 
polypeptides are referred to as "NOVX polypeptides" or "NOVX proteins." Unless indicated 
otherwise, "NOVX" is meant to refer to any of the novel sequences disclosed herein. Table 1 
provides a summary of the NOVX nucleic acids and their encoded polypeptides. 



TABLE 1. Sequences and Corresponding SEQ ID Numbers 



NOVX 
No. 


Internal Acc. No. 


Nucleic 
Acid 
SEQID 
NO. 


Amino 
Acid 
SEQID 
NO. 


Homology 


la 


CG58522-01 


1 


2 


human platelet 
activating factor 
acetylhydrolase 


2a 


CG58520-01 


3 


4 


GABA(A) receptor 


2b 


CG58520-02 


5 


6 


GABA(A) receptor 


2c 


CG58520-03 


7 


8 


GABA(A) receptor 



8 



WO 02/072757 



PCT/US02/06908 



3a 


CG58518-01 


9 


10 


GABA(A) receptor 




CG5 85 16-01 


11 


12 


Beta transducin 


5a 


CG58473-01 


13 


14 


Protein kinase 


6a 

UH 


CG5 8470-01 


15 


16 


UDP-N- 

acetylhexosamine 
pyrophosphorylase 


7a 


CG58593-01 


17 


18 


ubiqiiitin 52 like 


8a 


CG5 787 1-01 


19 


20 


tousled like kinase like 


9a 


CG58590-01 


21 


22 


guanylate kinase like 




CG5 8590-02 


23 


24 


diianvlate kinase like 


10a 


CG58572-01 


25 


26 


glucosamine phosphate 
N ace tvl transferase like 


10b 


CG58572-02 


27 


28 


glucosamine phosphate 

N a cetvl transferase like 


11a 


CG58564-01 


29 


30 


Protein tyrosine 

nhnsnhatasp like 


lib 


CG5 8564-02 


31 


32 


Protein tyrosine 

■nhnsnhatase lilce 


11c 


CG5 8564-03 


33 


34 


Dual-Specificity 
nhnsnhatase like 


lid 


CG58564-04 


35 


36 


Dual-Specificity 
nhosnhatase like 


12a 


CG57819-01 


37 


38 


RPGR interacting 
nrotein 1 like 

Ly x v twin x iiivv 


13a 


CG57789-01 


39 


40 


RAS like protein 
RRP22 like 


13b 


CG57789-02 


41 


42 


RAS like protein 

fcftP?? like 


14a 


CG57758-01 


43 


44 


sodium/lithium 

dependent 

dicarboxylate 

UalloJJUl IvI llivv 


14b 


CG57758-02 


45 


46 


sodium/lithium 

dependent 

dicarboxylate 

If QflCTVWt'^T* 1 1 L r ^ 

uaii&puiici iiivc 


14c 


CG57758-03 


47 


48 


sodium/lithium 

dependent 

dicarboxylate 

UallapUILCI J1K.C 


14d 


CG57758-04 


49 


50 


sodium/lithium 

dependent 

dicarboxylate 

transnnrter liVe 


14e 


CG57758-05 


51 


52 


sodium/lithium 
dependent 

Hl/*SIT'KrtY\/l f\tf* 

UlCal UUAy laLC 

transporter like 


15a 


CG57732-01 


53 


54 


Ca 2+ calmodulin 
denendent nrotein 
kinase IV kinase like 


15b 


CG57732-02 


55 


56 


Ca 2+ calmodulin 
dependent protein 
kinase IV kinase like 
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15c 


CG57732-03 


57 


58 


Ca 2+ calmodulin 
dependent protein 
kinase IV kinase like 


16a 


CG57709-01 


59 


60 


TCE21ike 


17a 


CG57700-01 


61 


62 


hydoxyacylglutathione 
hydrolase like 


17b 


CG57700-02 


63 


64 


hydoxyacylglutathione 
hydrolase like 


17c 


CG57700-03 


65 


66 


hydoxyacylglutathione 
hydrolase like 


17d 


CG57700-04 


67 


68 


hydoxyacylglutathione 
hydrolase like 


18a 


CG58553-01 


69 


70 


vavmressin recpntor 

like 


19a 


CG58626-01 


71 


72 


nhosohatidic acid 

pilViJL/llHUUlV UV1U 

preferring 

phospholipase Al like 


20a 


CG57597-01 


73 


74 


hypothetical protein 
like 


21a 


CG57804-01 


75 


76 


Talin like 


22a 


CG57551-01 


77 


78 


NAC-1 like 


23a 


CG574 11-01 


79 


80 


Kelch like 


24a 


CG57399-01 


81 


82 


phospholipase 
ADRAB-B nrecursor 
like 


24b 


CG57399-02 


83 


84 


phospholipase 
ADRAB-B nrecursor 
like 


24c 


CG5 73 99-03 


85 


86 


nhosnholinase 
ADRAB-B precursor 
like 


25a 


CG593 11-01 


87 


88 


acvl-coen7vme A 

UVJ V VUvllLJlllV x V 

thioester hydrolase 


25b 


CG593 11-02 


89 


90 


peroxisomal acyl- 

coen7vme A thif*e<;e*tpr 

hydrolase like 


25c 


CG593 11-03 


91 


92 


neroxisomal acvl- 
coenzvme A thioeseter 
hydrolase like 


26a 


CG59309-01 


93 


94 


acyl-coenzyme A 
thioester hydrolase 


27a 


CG57364-01 


95 


96 


CG6896 


28a 


CG59348-01 


97 


98 


cvtonlasmic nrotein 
(patent calls this Cyclin 
L-like) 


29a 


CG59245-01 


99 


100 


glucose 6-phosphatase 


29b 


CG59245-02 


101 


102 


olnco^ie fi-nho<;nhata^p 


30a 


CG59241-01 


103 


104 


Amiloride-sensitive 
sodium channel 


31a 


CG58602-01 


105 


106 


FAD binding domain 
containing protein 


32a 


CG58468-01 


107 


108 


Serum Amyloid Protein 


33a 


CG58 183-01 


109 


110 


N-Methyl-D-Aspartate 
receptor 
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34a 


CG593 15-01 


111 


112 


Connexin 


35a 


CG59203-01 


113 


114 


lysozyme C 


35b 


CG59203-02 


115 


116 


lysozyme C 


36a 


CG58662-01 


117 


118. 


cytoplasmic protein 


36b 


CG5 8662-02 


119 


120 


cytoplasmic protein 


37a 


CG58584-01 


121 


122 


40S ribosomal protein 
S29 like 


38a 


CG58538-01 


123 


124 


Histone deacetylase 

complex nrntpin 66 like 


39a 


CG59371-01 


125 


126 


expressed cytoplasmic 

nrotein like 

Ly i vy iv ill 1 1 i\v 


40a 


CG59346-01 


127 


128 


cortactin binding 
nrotein 1 like 


41a 


CG57814-01 


129 


130 


Basic 1 19 like homo 
sapiens 


41b 


CG578 14-02 


131 


132 


Basic 119 like homo 
sapiens 


42a 


CG59327-01 


133 


134 


Monocarboxylate 
transporter 1 like 


43a 


CG59494-01 


135 


136 


myelin P2 like 


44a 


CG59432-01 


137 


138 


chloride channel like 


44b 


CG59432-02 


139 


140 


chloride channel like 


45a 


CG593 94-01 


141 


142 


GPCR like 


46a 


CG59383-01 


143 


144 


D6MM5E PROTEIN 
like 


46b 


CG59383-02 


145 


146 


D6MM5E PROTEIN 
like 


47a 


CG5 85 26-01 


147 


148 


scramblase like 


48a 


CG57851-01 


149 


150 


sulfotransferase like 


49a 


PG59177-01 


151 


152 


pn**in like 


50a 


CG5 9258-01 


153 


154 


tran^crintional activator 
like 


51a 


CG5 9492-01 


155 


156 


Mvosin Head fMotor 
Domain) like 


52a 


CG5 9564-01 


157 


158 


Sorting nexin 6 like 


53a 


CG59553-01 


159 


160 


Secretorv nrotein SEC8 

L/Vwl vivl Jr Uft_4\_/(J 

like 


54a 


CG59545-01 


161 


162 


Placental protein 13 
like 


55a 




1 UJ 


164 


Nedd-1 like 


55b 


CG59435-02 


165 


166 


Nedd-1 like 


56a 


CG59439-01 


167 


168 


Xenobiotic/medium- 

fhain fattv and'CoA 
lipase form XT.-TTT like 

llEjVL^v Ivi 111 111 1 IIVV 


56b 


CG59439-02 


169 


170 


Xen o hi oti c/mp d i um - 
chain fattv acid* Co A 

V11U111 XUlVT UVIU^VUI ft- 

ligase form XL-HI like 


57a 


CG59354-01 


171 


172 


phosducin like 


57b 


CG59354-02 


173 


174 


phosducin like 


57c 


CG59354-03 


175 


176 


phosducin like 


58a 


CG593 19-01 


177 


178 


phosducin like 


58b 


CG593 19-02 


179 


180 


phosducin like 


59a 


CG59576-01 


181 


182 


GPCR like 


60a 


CG59557-01 


183 


184 


GPCR like 



11 



WO 02/072757 



PCT7US02/06908 



61a 


CG59555-01 


185 


186 


GPCR like 


62a 


CG59551-01 


187 


188 


GPCR like 


63a 


CG59540-01 


189 


190 


GPCR like 


64a 


CG5 9280-01 


191 


192 


GPCR like 


64b 


CG59280-02 


193 


194 


GPCR like 


65a 


CG59568-01 


195 


196 


GPCR like 


66a 


CG5 9224-01 


197 


198 


GPCR like 


67a 


CG59222-01 


199 


200 


GPCR like 


68a 


CG5 9220-01 


201 


202 


GPCR like 


69a 


CG59218-01 


203 


204 


GPCR like 


70a 


CG5 92 16-01 


205 


206 


GPCR like 


71a 


CG592 14-01 


207 


208 


GPCR like 


72a 


CG592 11-01 


209 


210 


GPCR like 


73a 


CG59276-01 


211 


212 


Dihydroorotate 

HpVivHrnf*pna<5P likp 

u&i ivui u^&iioov 1 1 ivvv 


74a 


CG59268-01 


213 


214 


monooxygenase like 


7Sa 


rr}S9S49.oi 


215 


216 


TT39fi likp fpvtrvnlaQmif* 
nrntpin with WTD reneat 

domain) 


76a 


CG59641-01 


217 


218 


Acetyl-CoA 
Carhoxvla^p 2 likp 


77a 


CG59630-01 


219 


220 


Midnolin like 


78a 


CG59561-01 


221 


222 


ACYL COENZYME A 
THIOESTER 
HYDROT ASF like 


79a 


CG59452-01 


223 


224 


CELL 

PROT TFFRATTON 
RELATED PROTEIN 
CAP like 

v/U 1 1 IV ^ 


80a 


CG59572-01 


225 


226 


P<*eiiHmiriHine Svntha^e 

X gWUUUUIlUlllV *J Y lllllwOV 

3 like 


80b 


CG59572-02 


227 


228 


Pseudouridine Synthase 
3 like 


81a 




229 


210 


iviyuaiii uivv* 


82a 


CG59520-01 


231 


232 


T* a mp^vl - 

1 tllll^O^fl 

nvrfvnho<;rVhate 

synthetase like 


83a 


CG59758-01 


233 


234 


UBIOUITIN like 


83b 


CG59758-02 


235 


236 


UBIQUTTIN like 


84a 


CG5 95 86-01 


237 


238 


oliifnkina^p likp 


85a 


CG59704-01 


239 


240 


QpTinp/tliTPnTiinp kina<»p 

like 


86a 


CG59628-01 


241 


242 


Short-chain 
dehvdroppna^p like 


87a 


CG595 16-01 


243 


244 


Calnonin like 

1 L/ V/l till 


87b 


CG595 16-02 


245 


246 


falnonin like 


88a 


CG59671-02 


247 


248 


ac vl -coeti7vme A 
thioester hydrolase 


89a 


CG56870-01 


249 


250 


NDRG3 like 


89b 


CG56870-02 


251 


252 


NDRG3 like 


89c 


CG56870-03 


253 


254 


NDRG3 like 


89d 


CG56870-04 


255 


256 


NDRG3 like 


89e 


CG56870-05 


257 


258 


NDRG3 like 


90a 


CG59764-01 


259 


260 


Ferritin like 
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91a 


CG59710-01 


261 


262 


P 14 like 


92a 


CG59754-02 


263 


264 


Downs syndrome cell 
adhesion molecule like 


92b 


CG59754-01 


265 


266 


Downs syndrome cell 
adhesion molecule like 




CG5 9800-01 


267 


268 


HEPARAN SULFATE 

D-GLUCOSAMINYL 

3-0- 

SULFOTRANSFERAS 
E-3B like 


94a 


CG59761-01 


269 


270 • 


AXIN 1 (AXIS 
INHIBITION 
PROTEIN 1) (HAXIN) 
like 


95a 


CG5 9756-01 


271 


272 


JUNCTOPHILIN 

TYPE 2 like 


96a 


CG59708-01 


273 


274 


Ubiquitin carboxyl- 
terminal hvdrolase 21 

like 


96b 


CG59708-02 


275 


276 


Ubiquitin carboxyl- 
terminal hydrolase 21 

IV4I1U1IU1 IIJ UlVlUOv *mh* M 

like 


96c 


CG59708-03 


277 


278 


Ubiquitin carboxyl- 
terminal hydrolase 21 
like 


97a 


CG59559-01 


279 


280 


BA12M19.1.3 like 


98a 


PG59669-01 




282 


pnrhfvnvl r**Hiif*tncp 

(called NADPH- 
deoendent carbonvl 
reductase-like in 
patent) 


99a 


CG5 8624-01 


283 


284 


metal transporter 


100a 


CG59679-01 


285 


286 


carbonyl reductase 


101a 


CG59644-01 


287 


288 


CGI 2091 (putative 
protein phosphatase) 


102a 


CG59662-01 


289 


290 


Cyclophilin 


103a 


CG59773-01 


291 


292 


Myomegalin 


103b 


CG59773-02 


293 


294 


M vomeral in 

I'ljf VI X 1 \* j\ 1 1 till 


103c 


CG59773-03 


295 


296 


Myomegalin 


104a 


CG57460-01 


297 


298 


PEPTIDYL-PROLYL 
CIS-TRANS 
ISOMERASE like 


105a 


CG5 7464-01 


299 

S S 


300 

•J Vy V 


N- 

ACETYLTRANSFER 
ASE like 


106a 


CG57466-01 


301 


302 


Acetylglucosaminyltra 
nsferase like 


107a 


CG5746R-01 


JvJ 


JU*T 


ADv uaiiopuiici 11 JVC 

homo sapiens 


108a 


CG59609-01 


305 


306 


PEPTIDYL-PROLYL 
CIS-TRANS 
ISOMERASE A like 


109a 


CG59613-01 


307 


308 


Proliferating cell 
nuclear antigen like 
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110a 


CG596 19-01 


309 


310 


CYTOPLASMIC 
ACTIN 2 like 


1 11a 


CG59621-01 


311 


312 


SELENOPHOSPHAT 
E SYNTHETASE like 


112a 


CG59625-01 


313 


314 


glucose transporter like 


113a 


CG59887-01 


315 


316 


Amino Acid/Metabolite 
Permease like 


113b 


CG5 98 87-02 


317 


318 


Amino Acid/Metabolite 
Permease like 


114a 


CG59861-01 


319 


320 


RTBULOSE-5- 
PHOSPHATE- 
EPMERASE like 


1 14b 


CG59861-02 


321 


322 


RJBULOSE-5- 
PHOSPHATE- 
EPIMERASE like 


115a 


CG59857-01 


323 


324 


Rhotekin like homo 
sapiens 


116a 


CG59855-01 


325 


326 


ATP SYNTHASE 
SUBUNIT C lik 


116b 


CG59855-02 


327 


328 


ATP SYNTHASE 
SUBUNTT C like 


117a 


CG5 9807-01 


329 


330 


Zinc finger like 


118a 


CG59805-01 


331 


332 


Zinc finger like 


119a 


CG59928-01 


333 


334 


Universal Stress (USP) 
Domain Containing 
Protein like 


120a 


CG59947-01 


335 


336 


VOLTAGE-GATED 
POTASSIUM 
CHANNEL PROTEIN 
KV3.3 (KSHULD) like 


121a 


CG5 993 8-01 

\^\J J77JO VI 


337 


338 


arylsulfatase like homo 
sapiens 


122a 


CG59746-01 


339 


340 


ubiquitin-specific 
processing protease 
like homo sapiens 


123a 


CG88613-01 


341 


342 


INOSITOL 1,4,5- 
TRISPHOSPHATE 3- 
KINASE 

ISOENZYME like 


124a 


CG59993-01 


343 


344 


synaptotagmin II like 


124b 


CG59993-02 


345 


346 


synaptotagmin II like 


125a 


CG5 999 1-01 


347 


348 


ooplasm specific 
protein like 


126a 


CG59987-01 


349 


350 


GTP-RHO binding 
protein 1 (rhophilin) 
like 




COS99R7-02 


351 


352 


GTP-RHO bindine 
protein 1 (rhophilin) 
like 


127a 


CG59971-01 


353 


354 


Leucine rich repeat 
(LRR) like 


127b 


CG59971-02 


355 


356 


Leucine rich repeat 
(LRR) like 



14 



WO 02/072757 PCT/US02/06908 

Table 1 indicates homology of NOVX nucleic acids to known protein families. Thus, 
the nucleic acids and polypeptides, antibodies and related compounds according to the 
invention corresponding to a NOVX as identified in column 1 of Table 1 will be useful in 
therapeutic and diagnostic applications implicated in, for example, pathologies and disorders 
associated with the known protein families identified in column 5 of Table 1. 

NOVX nucleic acids and their encoded polypeptides are useful in a variety of 

applications and contexts. The various NOVX nucleic acids and polypeptides according to the 

re- 
invention are useful as novel members of the protein families according to the presence of 

domains and sequence relatedness to previously described proteins. Additionally, NOVX 

nucleic .acids and polypeptides can also be used to identify proteins that are members of the 

family to which the NOVX polypeptides belong. 

Consistent with other known members of the family of proteins, identified in column 5 
of Table 1, the NOVX polypeptides of the present invention show homology to, and contain 
domains that are characteristic of, other members of such protein families. Details of the 
sequence relatedness and domain analysis for each NOVX are presented in Example A. 

The NOVX nucleic acids and polypeptides can also be used to screen for molecules, 
which inhibit or enhance NOVX activity or function. Specifically, the nucleic acids and 
polypeptides according to the invention may be used as targets for the identification of small 
molecules that modulate or inhibit diseases associated with the protein families listed in Table 
1. 

The NOVX nucleic acids and polypeptides are also useful for detecting specific cell 
types. Details of the expression analysis for each NOVX are presented in Example C. 
Accordingly, the NOVX nucleic acids, polypeptides, antibodies and related compounds 
according to the invention will have diagnostic and therapeutic applications in the detection of 
a variety of diseases with differential expression in normal vs. diseased tissues, e.g.a variety of 
cancers. 

Additional utilities for NOVX nucleic acids and polypeptides according to the 
invention are disclosed herein. 

The present invention is based on the identification of biological macromolecules 
differentially modulated in a pathologic state, disease, or an abnormal condition or state. 
Among the pathologies or diseases of present interest include metabolic diseases including 
those related to endocrinologic disorders, cancers, various tumors and neoplasias, 
inflammatory disorders, central nervous system disorders, and similar abnormal conditions or 



15 



WO 02/072757 PCT/US02/06908 

states. In very significant embodiments of the present invention, the biological 
macromolecules implicated in the pathologies and conditions are proteins and polypeptides, 
and in such cases the present invention is related as well to the nucleic acids that encode them. 
Methods that may be employed to identify relevant biological macromolecules include any 
procedures that detect differential expression of nucleic acids encoding proteins and 
polypeptides associated with the disorder, as well as procedures that detect the respective 
proteins and polypeptides themselves. Significant methods that have been employed by the 
present inventors, include GeneCalling ® technology and SeqCalling TM technology, 
disclosed respectively, in U. S. Patent No. 5,871,697, and in U. S. Ser. No. 09/417,386, filed 
Oct. 13, 1999, each of which is incorporated herein by reference in its entirety. GeneCalling ® 
is also described in Shimkets, et al., "Gene expression analysis by transcript profiling coupled 
to a gene database query" Nature Biotechnology 17:198-803 (1999). 

The invention provides polypeptides and nucleotides encoded thereby that have been 
identified as having novel associations with a disease or pathology, or an abnormal state or 
condition, in a mammal. The present invention further identifies a set of proteins and 
polypeptides, including naturally occurring polypeptides, precursor forms or proproteins, or 
mature forms of the polypeptides or proteins, which are implicated as targets for therapeutic 
agents in the treatment of various diseases, pathologies, abnormal states and conditions. A 
target may be employed in any of a variety of screening methodologies in order to identify 
candidate therapeutic agents which interact with the target and in so doing exert a desired or 
favorable effect. The candidate therapeutic agent is identified by screening a large collection 
of substances or compounds in an important embodiment of the invention. Such a collection 
may comprise a combinatorial library of substances or compounds in which, in at least one 
subset of substances or compounds, the individual members are related to each other by simple 
structural variations based on a particular canonical or basic chemical structure. The 
variations may include, by way of nonlimiting example, changes in length or identity of a 
basic framework of bonded atoms; changes in number, composition and disposition of ringed 
structures, bridge structures, alicyclic rings, and aromatic rings; and changes in pendent or 
substituents atoms or groups that are bonded at particular positions to the basic framework of 
bonded atoms or to the ringed structures, the bridge structures, the alicyclic structures, or the 
aromatic structures. 

A polypeptide or protein described herein, and that serves as a target in the screening 
procedure, includes the product of a naturally occurring polypeptide or precursor form or 
proprotein. The naturally occurring polypeptide, precursor or proprotein includes, e.g., the 
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full-length gene product, encoded by the corresponding gene. The naturally occurring 
polypeptide also includes the polypeptide, precursor or proprotein encoded by an open reading 
frame described herein. A "mature" form of a polypeptide or protein arises as a result of one 
or more naturally occurring processing steps as they may occur within the cell, including a 
host cell. The processing steps occur as the gene product arises, e.g., via cleavage of the 
amino-terminal methionine residue encoded by the initiation codon of an open reading frame, 
or the proteolytic cleavage of a signal peptide or leader sequence. Thus, a mature form arising 
from a precursor polypeptide or protein that has residues 1 to N, where residue 1 is the N- 
terminal methionine, would have residues 2 through N remaining. Alternatively, a mature 
form arising from a precursor polypeptide or protein having residues 1 to N, in which an 
amino-terminal signal sequence from residue 1 to residue M is cleaved, includes the residues 
from residue M+l to residue N remaining. A "mature" form of a polypeptide or protein may 
also arise from non-proteolytic post-translational modification. Such non-proteolytic processes 
include, e.g., glycosylation, myristylation or phosphorylation. In general, a mature polypeptide 
or protein may result from the operation of only one of these processes, or the combination of 
any of them. 

As used herein, "identical" residues correspond to those residues in a comparison 
between two sequences where the equivalent nucleotide base or amino acid residue in an 
alignment of two sequences is the same residue. Residues are alternatively described as 
"similar" or "positive" when the comparisons between two sequences in an alignment show 
that residues in an equivalent position in a comparison are either the same amino acid or a 
conserved amino acid as defined below. 

As used herein, a "chemical composition" relates to a composition including at least 
one compound that is either synthesized or extracted from a natural source. A chemical 
compound may be the product of a defined synthetic procedure. Such a synthesized 
compound is understood herein to have defined properties in terms of molecular formula, 
molecular structure relating the association of bonded atoms to each other, physical properties 
such as chromatographic or spectroscopic characterizations, and the like. A compound 
extracted from a natural source is advantageously analyzed by chemical and physical methods 
in order to provide a representation of its defined properties, including its molecular formula, 
molecular structure relating the association of bonded atoms to each other, physical properties 
such as chromatographic or spectroscopic characterizations, and the like. 

As used herein, a "candidate therapeutic agent" is a chemical compound that includes 
at least one substance shown to bind to a target biopolymer. In important embodiments of the 
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invention, the target biopolymer is a protein or polypeptide, a nucleic acid, a polysaccharide or 
proteoglycan, or a lipid such as a complex lipid. The method of identifying compounds that 
bind to the target effectively eliminates compounds with little or no binding affinity, thereby 
increasing the potential that the identified chemical compound may have beneficial therapeutic 
applications. In cases where the "candidate therapeutic agent" is a mixture of more than one 
chemical compound, subsequent screening procedures may be carried out to identify the 
particular substance in the mixture that is the binding compound, and that is to be identified as 
a candidate therapeutic agent. 

As used herein, a "pharmaceutical agent" is provided by screening a candidate 
therapeutic agent using models for a disease state or pathology in order to identify a candidate 
exerting a desired or beneficial therapeutic effect with relation to the disease or pathology. 
Such a candidate that successfully provides such an effect is termed a pharmaceutical agent 
herein. Nonlimiting examples of model systems that may be used in such screens include 
particular cell lines, cultured cells, tissue preparations, whole tissues, organ preparations, 
intact organs, and nonhuman mammals. Screens employing at least one system, and 
preferably more than one system, may be employed in order to identify a pharmaceutical 
agent. Any pharmaceutical agent so identified may be pursued in further investigation using 
human subjects. 

NOVX Nucleic Acids and Polypeptides 
NOVX clones 

NOVX nucleic acids and their encoded polypeptides are useful in a variety of 
applications and contexts. The various NOVX nucleic acids and polypeptides according to the 
invention are useful as novel members of the protein families according to the presence of 
domains and sequence relatedness to previously described proteins. Additionally, NOVX 
nucleic acids and polypeptides can also be used to identify proteins that are members of the 
family to which the NOVX polypeptides belong. 

The NOVX genes and their corresponding encoded proteins are useful for preventing, 
treating or ameliorating medical conditions, e.g., by protein or gene therapy. Pathological 
conditions can be diagnosed by determining the amount of the new protein in a sample or by 
determining the presence of mutations in the new genes. Specific uses are described for each 
of the NOVX genes, based on the tissues in which they are most highly expressed. Uses 
include developing products for the diagnosis or treatment of a variety of diseases and 
disorders. 
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The NOVX nucleic acids and proteins of the invention are useful in potential 
diagnostic and therapeutic applications and as a research tool. These include serving as a 
specific or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed, as well as potential 
therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small molecule 
drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), 
(iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), and (v) a composition 
promoting tissue regeneration in vitro and in vivo (vi) biological defense weapon. 

In one specific embodiment, the invention includes an isolated polypeptide comprising 
an amino acid sequence selected from the group consisting of: (a) a mature form of the amino 
acid sequence selected from the group consisting of SEQ ID NO: 2n, wherein n is an integer 
between 1 and 178; (b) a variant of a mature form of the amino acid sequence selected from 
the group consisting of SEQ ID NO: 2n, wherein n is an integer between 1 and 178, wherein 
any amino acid in the mature form is changed to a different amino acid, provided that no more 
than 15% of the amino acid residues in the sequence of the mature form are so changed; (c) an 
amino acid sequence selected from the group consisting of SEQ ID NO: 2n, wherein n is an 
integer between 1 and 178; (d) a variant of the amino acid sequence selected from the group 
consisting of SEQ ID NO:2n, wherein n is an integer between 1 and 178 wherein any amino 
acid specified in the chosen sequence is changed to a different amino acid, provided that no 
more than 15% of the amino acid residues in the sequence are so changed; and (e) a fragment 
of any of (a) through (d). 

In another specific embodiment, the invention includes an isolated nucleic acid 
molecule comprising a nucleic acid sequence encoding a polypeptide comprising an amino 
acid sequence selected from the group consisting of: (a) a mature form of the amino acid 
sequence given SEQ ID NO: 2n, wherein n is an integer between 1 and 178; (b) a variant of a 
mature form of the amino acid sequence selected from the group consisting of SEQ ID NO: 
2n, wherein n is an integer between 1 and 178 wherein any amino acid in the mature form of 
the chosen sequence is changed to a different amino acid, provided that no more than 15% of 
the amino acid residues in the sequence of the mature form are so changed; (c) the amino acid 
sequence selected from the group consisting of SEQ ID NO: 2n, wherein n is an integer 
between 1 and 178; (d) a variant of the amino acid sequence selected from the group 
consisting of SEQ ID NO: 2n, wherein n is an integer between 1 and 178, in which any amino 
acid specified in the chosen sequence is changed to a different amino acid, provided that no 
more than 1 5% of the amino acid residues in the sequence are so changed; (e) a nucleic acid 
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fragment encoding at least a portion of a polypeptide comprising the amino acid sequence 
selected from the group consisting of SEQ ED NO: 2n, wherein n is an integer between 1 and 
178 or any variant of said polypeptide wherein any amino acid of the chosen sequence is 
changed to a different amino acid, provided that no more than 10% of the amino acid residues 
in the sequence are so changed; and (f) the complement of any of said nucleic acid molecules. 

In yet another specific embodiment, the invention includes an isolated nucleic acid 
molecule, wherein said nucleic acid molecule comprises a nucleotide sequence selected from 
the group consisting of: (a) the nucleotide sequence selected from the group consisting of 
SEQ ID NO: 2n-l, wherein n is an integer between 1 and 178; (b) a nucleotide sequence 
wherein one or more nucleotides in the nucleotide sequence selected from the group consisting 
of SEQ ID NO: 2n-l, wherein n is an integer between 1 and 178 is changed from that 
selected from the group consisting of the chosen sequence to a different nucleotide provided 
that no more than 15% of the nucleotides are so changed; (c) a nucleic acid fragment of the 
sequence selected from the group consisting of SEQ ID NO: 2n-l, wherein n is an integer 
between 1 and 178; and (d) a nucleic acid fragment wherein one or more nucleotides in the 
nucleotide sequence selected from the group consisting of SEQ ID NO: 2n-l, wherein n is an 
integer between 1 and 178 is changed from that selected from the group consisting of the 
chosen sequence to a different nucleotide provided that no more than 15% of the nucleotides 
are so changed. 

One aspect of the invention pertains to isolated nucleic acid molecules that encode 
NOVX polypeptides or biologically active portions thereof. Also included in the invention are 
nucleic acid fragments sufficient for use as hybridization probes to identify NOVX-encoding 
nucleic acids {e.g., NOVX mRNAs) and fragments for use as PCR primers for the 
amplification and/or mutation of NOVX nucleic acid molecules. As used herein, the term 
"nucleic acid molecule" is intended to include DNA molecules (e.g., cDNA or genomic 
DNA), RNA molecules (e.g., mRNA), analogs of the DNA or RNA generated using 
nucleotide analogs, and derivatives, fragments and homologs thereof. The nucleic acid 
molecule may be single-stranded or double-stranded, but preferably is comprised double- 
stranded DNA. 

An NOVX nucleic acid can encode a mature NOVX polypeptide. As used herein, a 
"mature" form of a polypeptide or protein disclosed in the present invention is the product of a 
naturally occurring polypeptide or precursor form or proprotein. The naturally occurring 
polypeptide, precursor or proprotein includes, by way of nonlimiting example, the full-length 
gene product, encoded by the corresponding gene. Alternatively, it may be defined as the 
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polypeptide, precursor or proprotein encoded by an ORF described herein. The product 
"mature" form arises, again by way of nonlimiting example, as a result of one or more 
naturally occurring processing steps as they may take place within the cell, or host cell, in 
which the gene product arises. Examples of such processing steps leading to a "mature" form 
of a polypeptide or protein include the cleavage of the N-terminal methionine residue encoded 
by the initiation codon of an ORF, or the proteolytic cleavage of a signal peptide or leader 
sequence. Thus a mature form arising from a precursor polypeptide or protein that has 
residues 1 to N, where residue 1 is the N-terminal methionine, would have residues 2 through 
N remaining after removal of the N-terminal methionine: Alternatively, a mature form arising 
from a precursor polypeptide or protein having residues 1 to N, in which an N-terminal signal 
sequence from residue 1 to residue M is cleaved, would have the residues from residue M+l to 
residue N remaining. Further as used herein, a "mature" form of a polypeptide or protein may 
arise from a step of post-translational modification other than a proteolytic cleavage event. 
Such additional processes include, by way of non- limiting example, glycosylation, 
myristoylation or phosphorylation. In general, a mature polypeptide or protein may result 
from the operation of only one of these processes, or a combination of any of them. 

The term "probes", as utilized herein, refers to nucleic acid sequences of variable 
length, preferably between at least about 10 nucleotides (nt), 100 nt, or as many as 
approximately, e.g., 6,000 nt, depending upon the specific use. Probes are used in the 
detection of identical, similar, or complementary nucleic acid sequences. Longer length 
probes are generally obtained from a natural or recombinant source, are highly specific, and 
much slower to hybridize than shorter-length oligomer probes. Probes may be single- or 
double-stranded and designed to have specificity in PCR, membrane-based hybridization 
technologies, or ELIS A-like technologies. 

The term "isolated" nucleic acid molecule, as utilized herein, is one, which is separated 
from other nucleic acid molecules which are present in the natural source of the nucleic acid. 
Preferably, an "isolated" nucleic acid is free of sequences which naturally flank the nucleic 
acid (i.e., sequences located at the 5 - and 3-termini of the nucleic acid) in the genomic DNA 
of the organism from which the nucleic acid is derived. For example, in various embodiments, 
the isolated NOVX nucleic acid molecules can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 
kb, 0.5 kb or 0.1 kb of nucleotide sequences which naturally flank the nucleic acid molecule in 
genomic DNA of the cell/tissue from which the nucleic acid is derived (e.g., brain, heart, liver, 
spleen, etc.). Moreover, an "isolated" nucleic acid molecule, such as a cDNA molecule, can 
be substantially free of other cellular material or culture medium when produced by 
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recombinant techniques, or of chemical precursors or other chemicals when chemically 
synthesized. 

A nucleic acid molecule of the invention, e.g., a nucleic acid molecule having the nucleotide 
sequence SEQ ID NO: 2n- l, wherein n is an integer between 1 and 178, or a complement of 
this aforementioned nucleotide sequence, can be isolated using standard molecular biology 
techniques and the sequence information provided herein. Using all or a portion of the nucleic 
acid sequence of SEQ ID NO: 2n-l, wherein n is an integer between 1 and 178 as a 
hybridization probe, NOVX molecules can be isolated using standard hybridization and 
cloning techniques (e.g.^as described in Sambrook, et ah, (eds.)> Molecular Cloning: A 
Laboratory Manual 2 nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 
NY, 1989; and Ausubel, et al, (eds.), Current Protocols in Molecular Biology, John 
Wiley & Sons, New York, NY, 1 993.) 

A nucleic acid of the invention can be amplified using cDNA, mRNA or alternatively, 
genomic DNA, as a template and appropriate oligonucleotide primers according to standard 
PCR amplification techniques. The nucleic acid so amplified can be cloned into an 
appropriate vector and characterized by DNA sequence analysis. Furthermore, 
oligonucleotides corresponding to NOVX nucleotide sequences can be prepared by standard 
synthetic techniques, e.g., using an automated DNA synthesizer. 

As used herein, the term "oligonucleotide" refers to a series of linked nucleotide 
residues, which oligonucleotide has a sufficient number of nucleotide bases to be used in a 
PCR reaction. A short oligonucleotide sequence may be based on, or designed from, a 
genomic or cDNA sequence and is used to amplify, confirm, or reveal the presence of an 
identical, similar or complementary DNA or RNA in a particular cell or tissue. 
Oligonucleotides comprise portions of a nucleic acid sequence having about 10 nt, 50 nt, or 
100 nt in length, preferably about 15 nt to 30 nt in length. In one embodiment of the 
invention, an oligonucleotide comprising a nucleic acid molecule less than 100 nt in length 
would further comprise at least 6 contiguous nucleotides SEQ ID NO: 2n-l, wherein n is an 
integer between 1 and 178, or a complement thereof. Oligonucleotides may be chemically 
synthesized and may also be used as probes. 

In another embodiment, an isolated nucleic acid molecule of the invention comprises a 
nucleic acid molecule that is a complement of the nucleotide from the group consisting of SEQ 
ED NO: 2n-l, wherein n is an integer between 1 and 178, or a portion of this nucleotide 
sequence {e.g., a fragment that can be used as a probe or primer or a fragment encoding a 
biologically-active portion of an NOVX polypeptide). A nucleic acid molecule that is 
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complementary to the nucleotide sequence from the group consisting of SEQ ID NO: 2n-l, 
wherein n is an integer between 1 and 178 is one that is sufficiently complementary to the 
nucleotide sequence from the group consisting of SEQ ID NO: 2n-l, wherein n is an integer 
between 1 and 178 that it can hydrogen bond with little or no mismatches to the nucleotide 
sequence from the group consisting of SEQ ID NO: 2n-l, wherein n is an integer between 1 
and 1 78, thereby forming a stable duplex. 

As used herein, the term "complementary" refers to Watson-Crick or Hoogsteen base 
pairing between nucleotides units of a nucleic acid molecule, and the term "binding" means 
the physical or chemical interaction between two polypeptides or compounds or associated 
polypeptides or compounds or combinations thereof Binding includes ionic, non-ionic, van 
der Waals, hydrophobic interactions, and the like. A physical interaction can be either direct 
or indirect. Indirect interactions may be through or due to the effects of another polypeptide or 
compound. Direct binding refers to interactions that do not take place through, or due to, the 
effect of another polypeptide or compound, but instead are without other substantial chemical 
intermediates. 

Fragments provided herein are defined as sequences of at least 6 (contiguous) nucleic 
acids or at least 4 (contiguous) amino acids, a length sufficient to allow for specific 
hybridization in the case of nucleic acids or for specific recognition of an epitope in the case of 
amino acids, respectively, and are at most some portion less than a full length sequence. 
Fragments may be derived from any contiguous portion of a nucleic acid or amino acid 
sequence of choice. Derivatives are nucleic acid sequences or amino acid sequences formed 
from the native compounds either directly or by modification or partial substitution. Analogs 
are nucleic acid sequences or amino acid sequences that have a structure similar to, but not 
identical to, the native compound but differs from it in respect to certain components or side 
chains. Analogs may be synthetic or from a different evolutionary origin and may have a 
similar or opposite metabolic activity compared to wild type. Homologs are nucleic acid 
sequences or amino acid sequences of a particular gene that are derived from different species. 

A full-length NOVX clone is identified as containing an ATG translation start codon 
and an in-frame stop codon. Any disclosed NOVX nucleotide sequence lacking an ATG start 
codon therefore encodes a truncated C-terminal fragment of the respective NOVX 
polypeptide, and requires that the corresponding full-length cDNA extend in the 5 5 direction 
of the disclosed sequence. Any disclosed NOVX nucleotide sequence lacking an in-frame 
stop codon similarly encodes a truncated N-terminal fragment of the respective NOVX 
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polypeptide, and requires that the corresponding full-length cDNA extend in the 3' direction 
of the disclosed sequence, v 

Derivatives and analogs may be full length or other than full length, if the derivative or 
analog contains a modified nucleic acid or amino acid, as described below. Derivatives or 
analogs of the nucleic acids or proteins of the invention include, but are not limited to, 
molecules comprising regions that are substantially homologous to the nucleic acids or 
proteins of the invention, in various embodiments, by at least about 70%, 80%, or 95% 
identity (with a preferred identity of 80-95%) over a nucleic acid or amino acid sequence of 
identical size or when compared to an aligned sequence in which the alignment is done by a 
computer homology program known in the art, or whose encoding nucleic acid is capable of 
hybridizing to the complement of a sequence encoding the aforementioned proteins under 
stringent, moderately stringent, or low stringent conditions. See e.g. Ausubel, et al., Current 
Protocols in Molecular Biology, John Wiley & Sons, New York, NY, 1993, and below. 

A "homologous nucleic acid sequence" or "homologous amino acid sequence," or 
variations thereof, refer to sequences characterized by a homology at the nucleotide level or 
amino acid level as discussed above. Homologous nucleotide sequences encode those 
sequences coding for iso forms of NOVX polypeptides. Iso forms can be expressed in different 
tissues of the same organism as a result of, for example, alternative splicing of RNA. 
Alternatively, isoforms can be encoded by different genes. In the invention, homologous 
nucleotide sequences include nucleotide sequences encoding for an NOVX polypeptide of 
species other than humans, including, but not limited to: vertebrates, and thus can include, e.g., 
frog, mouse, rat, rabbit, dog, cat cow, horse, and other organisms. Homologous nucleotide 
sequences also include, but are not limited to, naturally occurring allelic variations and 
mutations of the nucleotide sequences set forth herein. A homologous nucleotide sequence 
does not, however, include the exact nucleotide sequence encoding human NOVX protein. 
Homologous nucleic acid sequences include those nucleic acid sequences that encode 
conservative amino acid substitutions (see below) in SEQ ID NO: 2n-l, wherein n is an 
integer between 1 and 178, as well as a polypeptide possessing NOVX biological activity. 
Various biological activities of the NOVX proteins are described below. 

An NOVX polypeptide is encoded by the open reading frame ("ORF") of an NOVX 
nucleic acid. An ORF corresponds to a nucleotide sequence that could potentially be translated 
into a polypeptide. A stretch of nucleic acids comprising an ORF is uninterrupted by a stop 
codon. An ORF that represents the coding sequence for a full protein begins with an ATG 
"start" codon and terminates with one of the three "stop" codons, namely, TAA, TAG, or 
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TGA. For the purposes of this invention, an ORF may be any part of a coding sequence, with 
or without a start codon, a stop codon, or both. For an ORF to be considered as a good 
candidate for coding for a bona fide cellular protein, a minimum size requirement is often set, 
e.g., a stretch of DNA that would encode a protein of 50 amino acids or more. 

The nucleotide sequences determined from the cloning of the human NOVX genes 
allows for the generation of probes and primers designed for use in identifying and/or cloning 
NOVX homologies in other cell types, e.g. from other tissues, as well as NOVX homologues 
from other vertebrates. The probe/primer typically comprises substantially purified 
oligonucleotide. The oligonucleotide typically comprises a region of nucleotide sequence that 
hybridizes under stringent conditions to at least about 12, 25, 50, 100, 150, 200, 250, 300, 350 
or 400 consecutive sense strand nucleotide sequence SEQ ID NO: 2n-l, wherein n is an 
integer between 1 and 178; or an anti-sense strand nucleotide sequence of SEQ ID NO: 2n-l, 
wherein n is an integer between 1 and 178. 

Probes based on the human NOVX nucleotide sequences can be used to detect 
transcripts or genomic sequences encoding the same or homologous proteins. In various 
embodiments, the probe further comprises a label group attached thereto, e.g. the label group 
can be a radioisotope, a fluorescent compound, an enzyme, or an enzyme co-factor. Such 
probes can be used as a part of a diagnostic test kit for identifying cells or tissues which mis- 
express an NOVX protein, such as by measuring a level of an NOVX-encoding nucleic acid in 
a sample of cells from a subject e.g., detecting NOVX mRNA levels or determining whether a 
genomic NOVX gene has been mutated or deleted. 

"A polypeptide having a biologically-active portion of an NOVX polypeptide" refers 
to polypeptides exhibiting activity similar, but not necessarily identical to, an activity of a 
polypeptide of the invention, including mature forms, as measured in a particular biological 
assay, with or without dose dependency. A nucleic acid fragment encoding a "biologically- 
active portion of NOVX" can be prepared by isolating a portion SEQ ID NO: 2n-l, wherein n 
is an integer between 1 and 178, that encodes a polypeptide having an NOVX biological 
activity (the biological activities of the NOVX proteins are described below), expressing the 
encoded portion of NOVX protein (e.g., by recombinant expression in vitro) and assessing the • 
activity of the encoded portion of NOVX. 

NOVX Nucleic Acid and Polypeptide Variants 

The invention further encompasses nucleic acid molecules that differ from the 

nucleotide sequences shown in SEQ ED NO: 2n-l, wherein n is an integer between 1 and 178 
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due to degeneracy of the genetic code and thus encode the same NOVX proteins as that 
encoded by the nucleotide sequences shown in SEQ ID NO: 2n-l, wherein n is an integer 
between 1 and 178. In another embodiment, an isolated nucleic acid molecule of the invention 
has a nucleotide sequence encoding a protein having an amino acid sequence shown in SEQ 
ED NO: 2n, wherein n is an integer between 1 and 178. 

In addition to the human NOVX nucleotide sequences shown in SEQ ID NO: 2n-l, 
wherein n is an integer between 1 and 178, it will be appreciated by those skilled in the art that 
DNA sequence polymorphisms that lead to changes in the amino acid sequences of the NOVX 
polypeptides may exist within a population (e.g., the human population). Such genetic 
polymorphism in the NOVX genes may exist among individuals within a population due to 
natural allelic variation. As used herein, the terms "gene" and "recombinant gene" refer to 
nucleic acid molecules comprising an open reading frame (ORF) encoding an NOVX protein, 
preferably a vertebrate NOVX protein. Such natural allelic variations can typically result in 
1-5% variance in the nucleotide sequence of the NOVX genes. Any and all such nucleotide 
variations and resulting amino acid polymorphisms in the NOVX polypeptides, which are the 
result of natural allelic variation and that do not alter the functional activity of the NOVX 
polypeptides, are intended to be within the scope of the invention. 

Moreover, nucleic acid molecules encoding NOVX proteins from other species, and 
thus that have a nucleotide sequence that differs from the human SEQ ID NO: 2n-l, wherein n 
is an integer between 1 and 178 are intended to be within the scope of the invention. Nucleic 
acid molecules corresponding to natural allelic variants and homologues of the NOVX cDNAs 
of the invention can be isolated based on their homology to the human NOVX nucleic acids 
disclosed herein using the human cDNAs, or a portion thereof, as a hybridization probe 
according to standard hybridization techniques under stringent hybridization conditions. 

Accordingly, in another embodiment, an isolated nucleic acid molecule of the 
invention is at least 6 nucleotides in length and hybridizes under stringent conditions to the 
nucleic acid molecule comprising the nucleotide sequence of SEQ ID NO: 2n-l, wherein n is 
an integer between 1 and 178. In another embodiment, the nucleic acid is at least 10, 25, 50, 
100, 250, 500, 750, 1000, 1500, or 2000 or more nucleotides in length. In yet another 
embodiment, an isolated nucleic acid molecule of the invention hybridizes to the coding 
region. As used herein, the term "hybridizes under stringent conditions" is intended to 
describe conditions for hybridization and washing under which nucleotide sequences at least 
60% homologous to each other typically remain hybridized to each other. 
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Homologs (Le., nucleic acids encoding NOVX proteins derived from species other 
than human) or other related sequences (e.g., paralogs) can be obtained by low, moderate or 
high Stringency hybridization with all or a portion of the particular human sequence as a probe 
using methods well known in the art for nucleic acid hybridization and cloning. 

As used herein, the phrase "stringent hybridization conditions" refers to conditions 
under which a probe, primer or oligonucleotide will hybridize to its target sequence, but to no 
other sequences. Stringent conditions are sequence-dependent and will be different in 
different circumstances. Longer sequences hybridize specifically at higher temperatures than 
shorter sequences. Generally, stringent conditions are selected to be about 5 °C lower than the 
thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The 
Tm is the temperature (under defined ionic strength, pH and nucleic acid concentration) at 
which 50% of the probes complementary to the target sequence hybridize to the target 
sequence at equilibrium. Since the target sequences are generally present at excess, at Tm, 
50% of the probes are occupied at equilibrium. Typically, stringent conditions will be those in 
which the salt concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M 
sodium ion (or other salts) at 

pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes, primers or 
oligonucleotides (e.g., 10 nt to 50 nt) and at least about 60°C for longer probes, primers and 
oligonucleotides. Stringent conditions may also be achieved with the addition of destabilizing 
agents, such as formamide. 

Stringent conditions are known to those skilled in the art and can be found in Ausubel, 
et aL, (eds.), CURRENT PROTOCOLS IN Molecular BIOLOGY, John Wiley & Sons, N. Y. 
(1989), 6.3.1-6.3.6. Preferably, the conditions are such that sequences at least about 65%, 
70%, 75%, 85%, 90%, 95%, 98%, or 99% homologous to each other typically remain 
hybridized to each other. A non-limiting example of stringent hybridization conditions are 
hybridization in a high salt buffer comprising 6X SSC, 50 mM Tris-HCl (pH 7.5), 1 mM 
EDTA, 0.02% PVP, 0.02% Ficoll, 0.02% BSA, and 500 mg/ml denatured salmon sperm DNA 
at 65°C, followed by one or more washes in 0.2X SSC, 0.01% BSA at 50°C. An isolated 
nucleic acid molecule of the invention that hybridizes under stringent conditions to the 
sequences SEQ ID NO: 2n-l, wherein n is an integer between 1 and 178, corresponds to a 
naturally-occurring nucleic acid molecule. As used herein, a "naturally-occurring" nucleic 
acid molecule refers to an RNA or DNA molecule having a nucleotide sequence that occurs in 
nature (e.g., encodes a natural protein). 
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In a second embodiment, a nucleic acid sequence that is hybridizable to the nucleic 
acid molecule comprising the nucleotide sequence of SEQ ID NO: 2n-l, wherein n is an 
integer between 1 and 178, or fragments, analogs or derivatives thereof, under conditions of 
moderate stringency is provided. A non-limiting example of moderate stringency 
hybridization conditions are hybridization in 6X SSC, 5X Denhardt's solution, 0.5% SDS and 
100 mg/ml denatured salmon sperm DNA at 55°C, followed by one or more washes in 
IX SSC, 0.1% SDS at 37°C. Other conditions of moderate stringency that may be used are 
well-known within the art. See, e.g., Ausubel, et al (eds.), 1993, CURRENT PROTOCOLS IN 
Molecular Biology, John Wiley & Sons, NY, and Kriegler, 1990; Gene Transfer and 
Expression, A Laboratory Manual, Stockton Press, NY. 

In a third embodiment, a nucleic acid that is hybridizable to the nucleic acid molecule 
comprising the nucleotide sequences SEQ ID NO: 2n-l, wherein n is an integer between 1 and 
178, or fragments, analogs or derivatives thereof, under conditions of low stringency, is 
provided. A non-limiting example of low stringency hybridization conditions are 
hybridization in 35% formamide, 5X SSC, 50 mM Tris-HCl (pH 7.5), 5 mM EDTA, 0.02% 
PVP, 0.02% Ficoll, 0.2% BSA, 100 mg/ml denatured salmon sperm DNA, 10% (wt/vol) 
dextran sulfate at 40°C, followed by one or more washes in 2X SSC, 25 mM Tris-HCl (pH 
7.4), 5 mM EDTA, and 0.1% SDS at 50°C. Other conditions of low stringency that may be 
used are well known in the art (e.g., as employed for cross-species hybridizations). See, e.g., 
Ausubel, et al (eds.), 1993, Current Protocols in Molecular Biology, John Wiley & 
Sons, NY, and Kriegler, 1990, Gene Transfer and Expression, A Laboratory Manual, 
Stockton Press, NY; Shilo and Weinberg, 1981. Proc Natl Acad Sci USA 78: 6789-6792. 

Conservative Mutations 

In addition to naturally-occurring allelic variants of NOVX sequences that may exist in 
the population, the skilled artisan will further appreciate that changes can be introduced by 
mutation into the nucleotide sequences SEQ ED NO: 2n-l, wherein n is an integer between 1 
and 178, thereby leading to changes in the amino acid sequences of the encoded NOVX 
proteins, without altering the functional ability of said NOVX proteins. For example, 
nucleotide substitutions leading to amino acid substitutions at "non-essential" amino acid 
residues can be made in the sequence SEQ ID NO: 2n, wherein n is an integer between 1 and 
178. A "non-essential" amino acid residue is a residue that can be altered from the wild-type 
sequences of the NOVX proteins without altering their biological activity, whereas an 
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"essential" amino acid residue is required for such biological activity. For example, amino 
acid residues that are conserved among the NOVX proteins of the invention are predicted to be 
particularly non-amenable to alteration. Amino acids for which conservative substitutions can 
be made are well-known within the art. 

Another aspect of the invention pertains to nucleic acid molecules encoding NOVX 
proteins that contain changes in amino acid residues that are not essential for activity. Such 
NOVX proteins differ in amino acid sequence from SEQ ID NO: 2n-l, wherein n is an integer 
between 1 and 178 yet retain biological activity. In one embodiment, the isolated nucleic acid 
molecule comprises a nucleotide sequence encoding a protein, wherein the protein comprises 
an amino acid sequence at least about 45% homologous to the amino acid sequences SEQ ID 
NO: 2n, wherein n is an integer between 1 and 178. Preferably, the protein encoded by the 
nucleic acid molecule is at least about 60% homologous to SEQ ID NO: 2n, wherein n is an 
integer between 1 and 178; more preferably at least about 70% homologous SEQ ID NO: 2n, 
wherein n is an integer between 1 and 178; still more preferably at least about 80% 
homologous to SEQ ID NO: 2n, wherein n is an integer between 1 and 178; even more 
preferably at least about 90% homologous to SEQ ID NO: 2n, wherein n is an integer between 
1 and 178; and most preferably at least about 95% homologous to SEQ ID NO: 2n, wherein n 
is an integer between 1 and 178. 

An isolated nucleic acid molecule encoding an NOVX protein homologous to the 
protein of SEQ ID NO: 2n, wherein n is an integer between 1 and 178 can be created by 
introducing one or more nucleotide substitutions, additions or deletions into the nucleotide 
sequence of SEQ ED NO: 2n-l, wherein n is an integer between 1 and 178, such that one or 
more amino acid substitutions, additions or deletions are introduced into the encoded protein. 

Mutations can be introduced into SEQ ID NO: 2n-l, wherein n is an integer between 1 
and 178 standard techniques, such as site-directed mutagenesis and PCR-mediated 
mutagenesis. Preferably, conservative amino acid substitutions are made at one or more 
predicted, non-essential amino acid residues. A "conservative amino acid substitution" is one 
in which the amino acid residue is replaced with an amino acid residue having a similar side 
chain. Families of amino acid residues having similar side chains have been defined within 
the art. These families include amino acids with basic side chains (e.g., lysine, arginine, 
histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains 
(e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side 
chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, 
tryptophan), beta-branched side chains (e.g., threonine, valine, isoleucine) and aromatic side 
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chains (e.g., tyrosine, phenylalanine, tryptophan, histidine). Thus, a predicted non-essential 
amino acid residue in the NOVX protein is replaced with another amino acid residue from the 
same side chain family. Alternatively, in another embodiment, mutations can be introduced 
randomly along all or part of an NOVX coding sequence, such as by saturation mutagenesis, 
and the resultant mutants can be screened for NOVX biological activity to identify mutants 
that retain activity. Following mutagenesis SEQ ID NO: 2n-l, wherein n is an integer between 
1 and 178, the encoded protein can be expressed by any recombinant technology known in the 
art and the activity of the protein can be determined. 

The relatedness of amino acid families may also be determined based on side chain 
interactions. Substituted amino acids may be fully conserved "strong" residues or fully 
conserved "weak" residues. The "strong" group of conserved amino acid residues may be any 
one of the following groups: ST A, NEQK, NHQK, NDEQ, QHRK, MILV, MILF, HY, FYW, 
wherein the single letter amino acid codes are grouped by those amino acids that may be 
substituted for each other. Likewise, the "weak" group of conserved residues may be any one 
of the following: CSA, ATV, SAG, STNK, STPA, SGND, SNDEQK, NDEQHK, NEQHRK, 
HFY, wherein the letters within each group represent the single letter amino acid code. 

In one embodiment, a mutant NOVX protein can be assayed for (i) the ability to form 
protein:protein interactions with other NOVX proteins, other cell-surface proteins, or 
biologicallyractive portions thereof, (ii) complex formation between a mutant NOVX protein 
and an NOVX ligand; or (Hi) the ability of a mutant NOVX protein to bind to an intracellular 
target protein or biologically-active portion thereof; (e.g. avidin proteins). 

In yet another embodiment, a mutant NOVX protein can be assayed for the ability to 
regulate a specific biological function (e.g., regulation of insulin release). 

Antisense Nucleic Acids 

Another aspect of the invention pertains to isolated antisense nucleic acid molecules 

that are hybridizable to or complementary to the nucleic acid molecule comprising the 

nucleotide sequence of SEQ ID NO: 2n-l, wherein n is an integer between 1 and 178, or 

fragments, analogs or derivatives thereof. An "antisense" nucleic acid comprises a nucleotide 

sequence that is complementary to a "sense" nucleic acid encoding a protein (e.g., 

complementary to the coding strand of a double-stranded cDNA molecule or complementary 

to an mRNA sequence). In specific aspects, antisense nucleic acid molecules are provided that 

comprise a sequence complementary to at least about 10, 25, 50, 100, 250 or 500 nucleotides 

or an entire NOVX coding strand, or to only a portion thereof. Nucleic acid molecules 
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encoding fragments, homologs, derivatives and analogs of an NOVX protein of SEQ ID NO: 
2n, wherein n is an integer between 1 and 178, or antisense nucleic acids complementary to an 
NOVX nucleic acid sequence of SEQ ED NO: 2n-l, wherein n is an integer between 1 and 
178, are additionally provided. 

In one embodiment, an antisense nucleic acid molecule is antisense to a "coding 
region" of the coding strand of a nucleotide sequence encoding an NOVX protein. The term 
"coding region" refers to the region of the nucleotide sequence comprising codons which are 
translated into amino acid residues. In another embodiment, the antisense nucleic acid 
molecule is antisense to a "noncoding region" of the coding strand of a nucleotide sequence 
encoding the NOVX protein. The term "noncoding region" refers to 5' and 3* sequences which 
flank the coding region that are not translated into amino acids (Le. 9 also referred to as 5 1 and 
3' untranslated regions). 

Given the coding strand sequences encoding the NOVX protein disclosed herein, 
antisense nucleic acids of the invention can be designed according to the rules of Watson and 
Crick or Hoogsteen base pairing. The antisense nucleic acid molecule can be complementary 
to the entire coding region of NOVX mRNA, but more preferably is an oligonucleotide that is 
antisense to only a portion of the coding or noncoding region of NOVX mRNA. For example, 
the antisense oligonucleotide can be complementary to the region surrounding the translation 
start site of NOVX mRNA. An antisense oligonucleotide can be, for example, about 5, 10, 15, 
20, 25, 30, 35, 40, 45 or 50 nucleotides in length. An antisense nucleic acid of the invention 
can be constructed using chemical synthesis or enzymatic ligation reactions using procedures 
known in the art. For example, an antisense nucleic acid (e.g., an antisense oligonucleotide) 
can be chemically synthesized using naturally-occurring nucleotides or variously modified 
nucleotides designed to increase the biological stability of the molecules or to increase the 
physical stability of the duplex formed between the antisense and sense nucleic acids (e.g., 
phosphorothioate derivatives and acridine substituted nucleotides can be used). 

Examples of modified nucleotides that can be used to generate the antisense nucleic 
acid include: 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, 
xanthine, 4-acetylcytosine, 5-(carboxyhydroxylmethyl) uracil, 5-carboxymethylaminomethyl- 
2-thiouridine, 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, 
inosine, N6-isopentenyladenine, 1 -methyl guanine, 1-methylinosine, 2,2-dimethylguanine, 
2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 
7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, 
beta-D-mannosylqueosine, S'-methoxycarboxymethyluracil, 5-methoxyuracil, 
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2- methylthio-N6-isopentenyladenine, uracil-5-oxyacetic acid (v), wybutoxosine, pseudouracil, 
queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, 
uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid (v), 5-methyl-2-thiouracil, 

3- (3-amino-3-N-2-carboxypropyl) uracil, (acp3)w, and 2,6-diaminopurine. Alternatively, the 
antisense nucleic acid can be produced biologically using an expression vector into which a 
nucleic acid has been subcloned in an antisense orientation (i.e., RNA transcribed from the 
inserted nucleic acid will be of an antisense orientation to a target nucleic acid of interest, 
described further in the following subsection). 

The antisense nucleic acid molecules of the invention are typically administered to a 
subject or generated in situ such that they hybridize with or bind to cellular mRNA and/or 
genomic DNA encoding an NOVX protein to thereby inhibit expression of the protein (e.g., by 
inhibiting transcription and/or translation). The hybridization can be by conventional 
nucleotide complementarity to form a stable duplex, or, for example, in the case of an 
antisense nucleic acid molecule that binds to DNA duplexes, through specific interactions in 
the major groove of the double helix. An example of a route of administration of antisense 
nucleic acid molecules of the invention includes direct injection at a tissue site. Alternatively, 
antisense nucleic acid molecules can be modified to target selected cells and then administered 
systemically. For example, for systemic administration, antisense molecules can be modified 
such that they specifically bind to receptors or antigens expressed on a selected cell surface 
(e-g-> by linking the antisense nucleic acid molecules to peptides or antibodies that bind to cell 
surface receptors or antigens). The antisense nucleic acid molecules can also be delivered to 
cells using the vectors described herein. To achieve sufficient nucleic acid molecules, vector 
constructs in which the antisense nucleic acid molecule is placed under the control of a strong 
pol II or pol III promoter are preferred. 

In yet another embodiment, the antisense nucleic acid molecule of the invention is an 
a-anpmeric nucleic acid molecule. An a-anomeric nucleic acid molecule forms specific 
double-stranded hybrids with complementary RNA in which, contrary to the usual P-units, the 
strands run parallel to each other. See, e.g., Gaultier, et aL, 1987. NucL Acids Res, 15: 
6625-6641. The antisense nucleic acid molecule can also comprise a 

2 f -o-methylribonucleotide {See, e.g., Inoue, et aL 1987. NucL Acids Res. 15: 6131-6148) or a 
chimeric RNA-DNA analogue {See, e.g., Inoue, et aL, 1987. FEBSLett. 215: 327-330. 
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Ribozymes and PNA Moieties 

Nucleic acid modifications include, by way of non-limiting example, modified bases, 
and nucleic acids whose sugar phosphate backbones are modified or derivatized. These 
modifications are carried out at least in part to enhance the chemical stability of the modified 
nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in 
therapeutic applications in a subject. 

In one embodiment, an antisense nucleic acid of the invention is a ribozyme. 
Ribozymes are catalytic RNA molecules with ribonuclease activity that are capable of 
cleaving a single-stranded nucleic acid, such as an mRNA, to which they have a 
complementary region. Thus, ribozymes (e.g., hammerhead ribozymes as described in 
Haselhoff and Gerlach 1988. Nature 334: 585-591) can be used to catalytically cleave NOVX 
mRNA transcripts to thereby inhibit translation of NOVX mRNA. A ribozyme having 
specificity for an NOVX-encoding nucleic acid can be designed based upon the nucleotide 
sequence of an NOVX cDNA disclosed herein (i.e., SEQ ID NO: 2n-l, wherein n is an integer 
between 1 and 178). For example, a derivative of a Tetrahymena L-19 IVS RNA can be 
constructed in which the nucleotide sequence of the active site is complementary to the 
nucleotide sequence to be cleaved in an NOVX-encoding mRNA. See, e.g., U.S. Patent 
4,987,071 to Cech, et al and U.S. Patent 5,1 16,742 to Cech, et ah NOVX mRNA can also be 
used to select a catalytic RNA having a specific ribonuclease activity from a pool of RNA 
molecules. See, e.g., Bartel et aL 9 (1993) Science 261:141 1-1418. 

Alternatively, NOVX gene expression can be inhibited by targeting nucleotide 
sequences complementary to the regulatory region of the NOVX nucleic acid (e.g., the NOVX 
promoter and/or enhancers) to form triple helical structures that prevent transcription of the 
NOVX gene in target cells. See, e.g., Helene, 1991. Anticancer Drug Des. 6: 569-84; Helene, 
etal 1992. Ann. N.Y. Acad ScL 660: 27-36; Maher, 1992. Bioassays 14: 807-15. 

In various embodiments, the NOVX nucleic acids can be modified at the base moiety, 
sugar moiety or phosphate backbone to improve, e.g., the stability, hybridization, or solubility 
of the molecule. For example, the deoxyribose phosphate backbone of the nucleic acids can 
be modified to generate peptide nucleic acids. See, e.g., Hyrup, et aL, 1996. BioorgMed 
Chem 4: 5-23. As used herein, the terms "peptide nucleic acids" or "PNAs" refer to nucleic 
acid mimics (e.g., DNA mimics) in which the deoxyribose phosphate backbone is replaced by 
a pseudopeptide backbone and only the four natural nucleobases are retained. The neutral 
backbone of PNAs has been shown to allow for specific hybridization to DNA and RNA under 
conditions of low ionic strength. The synthesis of PNA oligomers can be performed using 
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standard solid phase peptide synthesis protocols as described in Hyrup, et al, 1996. supra; 
Perry-O'Keefe, etal, 1996. Proc. Natl Acad. Sci. USA 93: 14670-14675. 

PNAs of NOVX can be used in therapeutic and diagnostic applications. For example, 
PNAs can be used as antisense or antigene agents for sequence-specific modulation of gene 
expression by, e.g., inducing transcription or translation arrest or inhibiting replication. PNAs 
of NOVX can also be used, for example, in the analysis of single base pair mutations in a gene 
(e.g., PNA directed PCR clamping; as artificial restriction enzymes when used in combination 
with other enzymes, e.g., Si nucleases (See, Hyrup, et al, I996.supra); or as probes or primers 
for DNA sequence and hybridization (See, Hyrup, et al, 1996, supra; Perry-O'Keefe, et al, 
1996. supra). 

In another embodiment, PNAs of NOVX can be modified, e.g., to enhance their 
stability or cellular uptake, by attaching lipophilic or other helper groups to PNA, by the 
formation of PNA-DNA chimeras, or by the use of liposomes or other techniques of drug 
delivery known in the art. For example, PNA-DNA chimeras of NOVX can be generated that 
may combine the advantageous properties of PNA and DNA. Such chimeras allow DNA 
recognition enzymes (e.g., RNase H and DNA polymerases) to interact with the DNA portion 
while the PNA portion would provide high binding affinity and specificity. PNA-DNA 
chimeras can be linked using linkers of appropriate lengths selected in terms of base stacking, 
number of bonds between the nucleobases, and orientation (see, Hyrup, et al., 1996. supra). 
The synthesis of PNA-DNA chimeras can be performed as described in Hyrup, et al, 1996. 
supra and Finn, et al, 1996. Nucl Acids Res 24: 3357-3363. For example, a DNA chain can 
be synthesized on a solid support using standard phosphoramidite coupling chemistry, and 
modified nucleoside analogs, e.g., 5 , -(4-methoxytrityl)amino-5 , -deoxy-thymidine 
phosphoramidite, can be used between the PNA and the 5' end of DNA. See, e.g., Mag, et al, 
1989. Nucl Acid Res 17: 5973-5988. PNA monomers are then coupled in a stepwise manner 
to produce a chimeric molecule with a 5' PNA segment and a 3 ! DNA segment. See, e.g., 
Finn, et al, 1996. supra. Alternatively, chimeric molecules can be synthesized with a 5* DNA 
segment and a 3' PNA segment. See, e.g., Petersen, et al, 1975. Bioorg. Med. Chem. Lett. 5: 
1119-11124. 

In other embodiments, the oligonucleotide may include other appended groups such as 
peptides (e.g., for targeting host cell receptors in vivo), or agents facilitating transport across 
the cell membrane (see, e.g., Letsinger, et al, 1989. Proc. Natl Acad. ScL U.S.A. 86: 
6553-6556; Lemaitre, et al, 1987. Proc. Natl Acad. Sci. 84: 648-652; PCT Publication No. 
WO88/09810) or the blood-brain barrier (see, e.g., PCT Publication No. WO 89/10134). In 
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addition, oligonucleotides can be modified with hybridization triggered cleavage agents (see, 
e.g., Krol, et aL 9 1988. BioTechniques 6:958-976) or intercalating agents (see, e.g., Zon, 1988. 
Pharm. Res. 5: 539-549). To this end, the oligonucleotide may be conjugated to another 
molecule, e.g., a peptide, a hybridization triggered cross-linking agent, a transport agent, a 
hybridization-triggered cleavage agent, and the like. 

NOVX Polypeptides 

A polypeptide according to the invention includes a polypeptide including the amino 
acid sequence of NOVX polypeptides whose sequences are provided in SEQ ID NO: 2n, 
wherein n is an integer between 1 and 178. The invention also includes a mutant or variant 
protein any of whose residues may be changed from the corresponding residues shown in SEQ 
ID NO: 2n, wherein n is an integer between 1 and 178 while still encoding a protein that 
maintains its NOVX activities and physiological functions, or a functional fragment thereof. 

In general, an NOVX variant that preserves NOVX-like function includes any variant 
in which residues at a particular position in the sequence have been substituted by other amino 
acids, and further include the possibility of inserting an additional residue or residues between 
two residues of the parent protein as well as the possibility of deleting one or more residues 
from the parent sequence. Any amino acid substitution, insertion, or deletion is encompassed 
by the invention. In favorable circumstances, the substitution is a conservative substitution as 
defined above. 

One aspect of the invention pertains to isolated NOVX proteins, and biologically- 
active portions thereof, or derivatives, fragments, analogs or homologs thereof. Also provided 
are polypeptide fragments suitable for use as immunogens to raise anti-NOVX antibodies. In 
one embodiment, native NOVX proteins can be isolated from cells or tissue sources by an 
appropriate purification scheme using standard protein purification techniques. In another 
embodiment, NOVX proteins are produced by recombinant DNA techniques. Alternative to 
recombinant expression, an NOVX protein or polypeptide can be synthesized chemically 
using standard peptide synthesis techniques. 

An "isolated" or "purified" polypeptide or protein or biologically-active portion thereof 

is substantially free of cellular material or other contaminating proteins from the cell or tissue 

source from which the NOVX protein is derived, or substantially free from chemical 

precursors or other chemicals when chemically synthesized. The language "substantially free 

of cellular material" includes preparations of NOVX proteins in which the protein is separated 

from cellular components of the cells from which it is isolated or recombinantly-produced. In 
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one embodiment; the language "substantially free of cellular material" includes preparations of 
- NOVX proteins having less than about 30% (by dry weight) of non-NOVX proteins (also 
referred to herein as a "contaminating protein"), more preferably less than about 20% of 
non-NOVX proteins, still more preferably less than about 10% of non-NOVX proteins, and 
. most preferably less than about 5% of non-NOVX proteins. When the NOVX protein or 
biologically-active portion thereof is recombinantly-produced, it is also preferably 
substantially free of culture medium, /.e, culture medium represents less than about 20%, 
more preferably less than about 10%, and most preferably less than about 5% of the volume of 
the NOVX protein preparation. 

The language "substantially free of chemical precursors or other chemicals" includes 
preparations of NOVX proteins in which the protein is separated from chemical precursors or 
other chemicals that are involved in the synthesis of the protein. In one embodiment, the 
language "substantially free of chemical precursors or other chemicals" includes preparations 
of NOVX proteins having less than about 30% (by dry weight) of chemical precursors or 
non-NOVX chemicals, more preferably less than about 20% chemical precursors or 
non-NOVX chemicals, still more preferably less than about 10% chemical precursors or 
non-NOVX chemicals, and most preferably less than about 5% chemical precursors or 
non-NOVX chemicals. 

Biologically-active portions of NOVX proteins include peptides comprising amino 
acid sequences sufficiently homologous to or derived from the amino acid sequences of the 
NOVX proteins (e.g., the amino acid sequence shown in SEQ ID NO: 2n, wherein n is an 
integer between 1 and 178) that include fewer amino acids than the full-length NOVX 
proteins, and exhibit at least one activity of an NOVX protein. Typically, biologically-active 
portions comprise a domain or motif with at least one activity of the NOVX protein. A 
biologically-active portion of an NOVX protein can be a polypeptide which is, for example, 
10, 25, 50, 100 or more amino acid residues in length. 

Moreover, other biologically-active portions, in which other regions of the protein are deleted, 
can be prepared by recombinant techniques and evaluated for one or more of the functional 
activities of a native NOVX protein. 

In an embodiment, the NOVX protein has an amino acid sequence shown SEQ ID NO: 
2n, wherein n is an integer between 1 and 178. In other embodiments, the NOVX protein is 
substantially homologous to SEQ ID NO: 2n, wherein n is an integer between 1 and 178, and 
retains the functional activity of the protein of SEQ ID NO: 2n, wherein n is an integer 
between 1 and 178, yet differs in amino acid sequence due to natural allelic variation or 
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mutagenesis, as described in detail, below. Accordingly, in another embodiment, the NOVX 
protein is a protein that comprises an amino acid sequence at least about 45% homologous to 
the amino acid sequence SEQ ID NO: 2n, wherein n is an integer between 1 and 178, and 
retains the functional activity of the NOVX proteins of SEQ ID NO: 2n, wherein n is an 
integer between 1 and 178. 

Determining Homology Between Two or More Sequences 

To determine the percent homology of two amino acid sequences or of two nucleic 
acids, the sequences are aligned for optimal comparison purposes {e.g., gaps can be introduced 
in the sequence of a first amino acid or nucleic acid sequence for optimal alignment with a 
second amino or nucleic acid sequence). The amino acid residues or nucleotides at 
corresponding amino acid positions or nucleotide positions are then compared. When a 
position in the first sequence is occupied by the same amino acid residue or nucleotide as the 
corresponding position in the second sequence, then the molecules are homologous at that 
position (i.e., as used herein amino acid or nucleic acid "homology" is equivalent to amino 
acid or nucleic acid "identity"). 

The nucleic acid sequence homology may be determined as the degree of identity between two 
sequences. The homology may be determined using computer programs known in the art, 
such as GAP software provided in the GCG program package. See, Needleman and Wunsch, 
1970. JMol Biol 48: 443-453. Using GCG GAP software with the following settings for 
nucleic acid sequence comparison: GAP creation penalty of 5.0 and GAP extension penalty of 
0.3, the coding region of the analogous nucleic acid sequences referred to above exhibits a 
degree of identity preferably of at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99%, with 
the CDS (encoding) part of the DNA from the group consisting of SEQ ID NO: 2n-l, wherein 
n is an integer between 1 and 178. 

The term "sequence identity" refers to the degree to which two polynucleotide or 
polypeptide sequences are identical on a residue-by-residue basis over a particular region of 
comparison. The term "percentage of sequence identity" is calculated by comparing two 
optimally aligned sequences over that region of comparison, determining the number of 
positions at which the identical nucleic acid base {e.g., A, T, C, G, U, or I, in the case of 
nucleic acids) occurs in both sequences to yield the number of matched positions, dividing the 
number of matched positions by the total number of positions in the region of comparison {i.e., 
the window size), and multiplying the result by 100 to yield the percentage of sequence 

identity. The term "substantial identity" as used herein denotes a characteristic of a 
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polynucleotide sequence, wherein the polynucleotide comprises a sequence that has at least 80 
percent sequence identity, preferably at least 85 percent identity and often 90 to 95 percent 
sequence identity, more usually at least 99 percent sequence identity as compared to a 
reference sequence over a comparison region. 

Chimeric and Fusion Proteins 

The invention also provides NOVX chimeric or fusion proteins. As used herein, an 
NOVX "chimeric protein" or "fusion protein" comprises an NOVX polypeptide operatively- 
linked to a non-NOVX polypeptide. An "NOVX polypeptide" refers to a polypeptide having 
an amino acid sequence corresponding to an NOVX protein SEQ ID NO: 2n, wherein n is an 
integer between 1 and 178, whereas a "non-NOVX polypeptide" refers to a polypeptide having 
an amino acid sequence corresponding to a protein that is not substantially homologous to the 
NOVX protein, e.g., a protein that is different from the NOVX protein and that is derived from 
the same or a different organism. Within an NOVX fusion protein the NOVX polypeptide can 
correspond to all or a portion of an NOVX protein. In one embodiment, an NOVX fusion 
protein comprises at least one biologically-active portion of an NOVX protein. In another 
embodiment, an NOVX fusion protein comprises at least two biologically-active portions of 
an NOVX protein. In yet another embodiment, an NOVX fusion protein comprises at least 
three biologically-active portions of an NOVX protein. Within the fusion protein, the term 
"operatively-linked" is intended to indicate that the NOVX polypeptide and the non-NOVX 
polypeptide are fused in-frame with one another. The non-NOVX polypeptide can be fused to 
the N-terminus or C-terminus of the NOVX polypeptide. 

In one embodiment, the fusion protein is a GST-NOVX fusion protein in which the 
NOVX sequences are fused to the C-terminus of the GST (glutathione S-transferase) 
sequences. Such fusion proteins can facilitate the purification of recombinant NOVX 
polypeptides. 

In another embodiment, the fusion protein is an NOVX protein containing a heterologous 
signal sequence at its N-terminus. In certain host cells (eg., mammalian host cells), 
expression and/or secretion of NOVX can be increased through use of a heterologous signal 
sequence. 

In yet another embodiment, the fusion protein is an NOVX-immunoglobulin fusion 

protein in which the NOVX sequences are fused to sequences derived from a member of the 

immunoglobulin protein family. The NOVX-immunoglobulin fusion proteins of the invention 

can be incorporated into pharmaceutical compositions and administered to a subject to inhibit 
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an interaction between an NOVX ligand and an NOVX protein on the surface of a cell, to 
thereby suppress NOVX-mediated signal transduction in vivo. The NOVX-immunoglobulin 
fusion proteins can be used to affect the bioavailability of an NOVX cognate ligand. 
Inhibition of the NOVX ligand/NOVX interaction may be useful therapeutically for both the 
treatment of proliferative and differentiative disorders, as well as modulating (e.g. promoting 
or inhibiting) cell survival. Moreover, the NOVX-immunoglobulin fusion proteins of the 
invention can be used as immunogens to produce anti-NOVX antibodies in a subject, to purify 
NOVX ligands, and in screening assays to identify molecules that inhibit the interaction of 
NOVX with an NOVX ligand. 

An NOVX chimeric or fusion protein of the invention can be produced by standard 
recombinant DNA techniques. For example, DNA fragments coding for the different 
polypeptide sequences are ligated together in-frame in accordance with conventional 
techniques, e.g., by employing blunt-ended or stagger-ended termini for ligation, restriction 
enzyme digestion to provide for appropriate termini, filling- in of cohesive ends as appropriate, 
alkaline phosphatase treatment to avoid undesirable joining, and enzymatic ligation. In 
another embodiment, the fusion gene can be synthesized by conventional techniques including 
automated DNA synthesizers. Alternatively, PCR amplification of gene fragments can be 
carried out using anchor primers that give rise to complementary overhangs between two 
consecutive gene fragments that can subsequently be annealed and reamplified to generate a 
chimeric gene sequence (see, e.g., Ausubel, et al (eds.) Current Protocols in Molecular 
Biology, John Wiley & Sons, 1992). Moreover, many expression vectors are commercially 
available that already encode a fusion moiety (e.g., a GST polypeptide). An NOVX-encoding 
nucleic acid can be cloned into such an expression vector such that the fusion moiety is linked 
in-frame to the NOVX protein. 

NOVX Agonists and Antagonists 

The invention also pertains to variants of the NOVX proteins that function as either 

NOVX agonists (i.e., mimetics) or as NOVX antagonists. Variants of the NOVX protein can 

be generated by mutagenesis (e.g., discrete point mutation or truncation of the NOVX protein). 

An agonist of the NOVX protein can retain substantially the same, or a subset of, the 

biological activities of the naturally occurring form of the NOVX protein. An antagonist of 

the NOVX protein can inhibit one or more of the activities of the naturally occurring form of 

the NOVX protein by, for example, competitively binding to a downstream or upstream 

member of a cellular signaling cascade which includes the NOVX protein. Thus, specific 
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biological effects can be elicited by treatment with a variant of limited function. In one 
embodiment, treatment of a subject with a variant having a subset of the biological activities 
of the naturally occurring form of the protein has fewer side effects in a subject relative to 
treatment with the naturally occurring form of the NOVX proteins. 

Variants of the NOVX proteins that function as either NOVX agonists (i.e., mimetics) 
or as NOVX antagonists can be identified by screening combinatorial libraries of mutants 
(eg., truncation mutants) of the NOVX proteins for NOVX protein agonist or antagonist 
activity. In one embodiment, a variegated library of NOVX variants is generated by 
combinatorial mutagenesis at the nucleic acid level and is encoded by a variegated gene 
library. A variegated library of NOVX variants can be produced by, for example, 
enzymatically ligating a mixture of synthetic oligonucleotides into gene sequences such that a 
degenerate set of potential NOVX sequences is expressible as indi vidual polypeptides, or 
alternatively, as a set of larger fusion proteins (e.g., for phage display) containing the set of 
NOVX sequences therein. There are a variety of methods which can be used to produce 
libraries of potential NOVX variants from a degenerate oligonucleotide sequence. Chemical 
synthesis of a degenerate gene sequence can be performed in an automatic DNA synthesizer, 
and the synthetic gene then ligated into an appropriate expression vector. Use of a degenerate 
set of genes allows for the provision, in one mixture, of all of the sequences encoding the 
desired set of potential NOVX sequences. Methods for synthesizing degenerate 
oligonucleotides are well-known within the art. See, e.g., Narang, 1983. Tetrahedron 39: 3; 
Itakura, et al., 1984. Annu. Rev. Biochem. 53: 323; Itakura, et al, 1984. Science 198: 1056; 
Ike, et al, 1983. Nucl Acids Res. 11: 477. 

Polypeptide Libraries 

In addition, libraries of fragments of the NOVX protein coding sequences can be used 
to generate a variegated population of NOVX fragments for screening and subsequent 
selection of variants of an NOVX protein. In one embodiment, a library of coding sequence 
fragments can be generated by treating a double stranded PCR fragment of an NOVX coding 
sequence with a nuclease under conditions wherein nicking occurs only about once per 
molecule, denaturing the double stranded DNA, renaturing the DNA to form double-stranded 
DNA that can include sense/antisense pairs from different nicked products, removing single 
stranded portions from reformed duplexes by treatment with Si nuclease, and ligating the 
resulting fragment library into an expression vector. By this method, expression libraries can 
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be derived which encodes N-terminal and internal fragments of various sizes of the NOVX 
proteins. 

Various techniques are known in the art for screening gene products of combinatorial 
libraries made by point mutations or truncation, and for screening cDNA libraries for gene 
products having a selected property. Such techniques are adaptable for rapid screening of the 
gene libraries generated by the combinatorial mutagenesis of NOVX proteins. The most 
widely used techniques, which are amenable to high throughput analysis, for screening large 
gene libraries typically include cloning the gene library into replicable expression vectors, 
transforming appropriate cells with the resulting library of vectors, and expressing the 
combinatorial genes under conditions in which detection of a desired activity facilitates 
isolation of the vector encoding the gene whose product was detected. Recursive ensemble 
mutagenesis (REM), a new technique that enhances the frequency of functional mutants in the 
libraries, can be used in combination with the screening assays to identify NOVX variants. 
See, e.g., Arkin and Yourvan, 1992. Proc. Natl Acad. Sci. USA 89: 781 1-7815; Delgrave, et 
al. 9 1993. Protein Engineering 6:327-33 1 . 

NOVX Antibodies 

The term "antibody" as used herein refers to immunoglobulin molecules and 
immunologically active portions of immunoglobulin (Ig) molecules, i.e., molecules that 
contain an antigen binding site that specifically binds (immunoreacts with) an antigen. Such 
antibodies include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, F ab , 
F a b* and F( a b')2 fragments, and an F a b expression library. In general, antibody molecules 
obtained from humans relates to any of the classes IgG, IgM, IgA, IgE and IgD, which differ 
from one another by the nature of the heavy chain present in the molecule. Certain classes 
have subclasses as well, such as IgGi, IgG2, and others. Furthermore, in humans, the light 
chain may be a kappa chain or a lambda chain. Reference herein to antibodies includes a 
reference to all such classes, subclasses and types of human antibody species. 

An isolated protein of the invention intended to serve as an antigen, or a portion or 

fragment thereof, can be used as an immunogen to generate antibodies that 

immunospecifically bind the antigen, using standard techniques for polyclonal and monoclonal 

antibody preparation. The full-length protein can be used or, alternatively, the invention 

provides antigenic peptide fragments of the antigen for use as immunogens. An antigenic 

peptide fragment comprises at least 6 amino acid residues of the amino acid sequence of the 

full length protein, such as an amino acid sequence shown in SEQ ID NO: 2n, wherein n is an 
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integer between 1 and 178, and encompasses an epitope thereof such that an antibody raised 
against the peptide forms a specific immune complex with the full length protein or with any 
fragment that contains the epitope. Preferably, the antigenic peptide comprises at least 10 
amino acid residues, or at least 15 amino acid residues, or at least 20 amino acid residues, or at 
least 30 amino acid residues. Preferred epitopes encompassed by the antigenic peptide are 
regions of the protein that are located on its surface; commonly these are hydrophilic regions. 

In certain embodiments of the invention, at least one epitope encompassed by the 
antigenic peptide is a region of NOVX that is located on the surface of the protein, e.g., a 
hydrophilic region. A hydrophobicity analysis of the human NOVX protein sequence will 
indicate which regions of a NOVX polypeptide are particularly hydrophilic and, therefore, are 
likely to encode surface residues useful for targeting antibody production. As a means for 
targeting antibody production, hydropathy plots showing regions of hydrophilicity and 
hydrophobicity may be generated by any method well known in the art, including, for 
example, the Kyte Doolittle or the Hopp Woods methods, either with or without Fourier 
transformation. See, e.g., Hopp and Woods, 1981, Proc. Nat Acad. Sci. USA 78: 3824-3828; 
Kyte and Doolittle 1982, J. Mol Biol. 157: 105-142, each incorporated herein by reference in 
their entirety. Antibodies that are specific for one or more domains within an antigenic protein, 
or derivatives, fragments, analogs or homologs thereof, are also provided herein. 

A protein of the invention, or a derivative, fragment, analog, homolog or ortholog 
thereof, may be utilized as an immunogen in the generation of antibodies that 
immunospecifically bind these protein components. 

Various procedures known within the art may be used for the production of polyclonal 
or monoclonal antibodies directed against a protein of the invention, or against derivatives, 
fragments, analogs homologs or orthologs thereof (see, for example, Antibodies: A Laboratory 
Manual, Harlow E, and Lane D, 1988, Cold Spring Harbor Laboratory Press, Cold Spring 
Harbor, NY, incorporated herein by reference). Some of these antibodies are discussed below. 

Polyclonal Antibodies 

For the production of polyclonal antibodies, various suitable host animals (e.g., rabbit, 
goat, mouse or other mammal) may be immunized by one or more injections with the native 
protein, a synthetic variant thereof, or a derivative of the foregoing. An appropriate 
immunogenic preparation can contain, for example, the naturally occurring immunogenic 
protein, a chemically synthesized polypeptide representing the immunogenic protein, or a 
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recombinantly expressed immunogenic protein. Furthermore, the protein may be conjugated 
to a second protein known to be immunogenic in the mammal being immunized. Examples of 
such immunogenic proteins include but are not limited to keyhole limpet hemocyanin, serum 
albumin, bovine thyroglobulin, and soybean trypsin inhibitor. The preparation can further 
include an adjuvant. Various adjuvants used to increase the immunological response include, 
but are not limited to, Freund's (complete and incomplete), mineral gels (e.g., aluminum 
hydroxide), surface active substances (e.g., lysolecithin, pluronic polyols, polyanions, 
peptides, oil emulsions, dinitrophenol, etc.), adjuvants usable in humans such as Bacille 
Calmette-Guerin and Corynebacterium parvum, or similar immunostimulatory agents. 
Additional examples of adjuvants which can be employed include MPL-TDM adjuvant 
(monophosphoryl Lipid A, synthetic trehalose dicorynomycolate). 

The polyclonal antibody molecules directed against the immunogenic protein can be 
isolated from the mammal (e.g., from the blood) and further purified by well known 
techniques, such as affinity chromatography using protein A or protein G, which provide 
primarily the IgG fraction of immune serum. Subsequently, or alternatively, the specific 
antigen which is the target of the immunoglobulin sought, or an epitope thereof, may be 
immobilized on a column to purify the immune specific antibody by immunoaffinity 
chromatography. Purification of immunoglobulins is discussed, for example, by D. Wilkinson 
(The Scientist, published by The Scientist, Inc., Philadelphia PA, Vol. 14, No. 8 (April 17, 
2000), pp. 25-28). 

Monoclonal Antibodies 

The term "monoclonal antibody" (MAb) or "monoclonal antibody composition", as 
used herein, refers to a population of antibody molecules that contain only one molecular 
species of antibody molecule consisting of a unique light chain gene product and a unique 
heavy chain gene product. In particular, the complementarity determining regions (CDRs) of 
the monoclonal antibody are identical in all the molecules of the population. MAbs thus 
contain an antigen binding site capable of immunoreacting with a particular epitope of the 
antigen characterized by a unique binding affinity for it. 

Monoclonal antibodies can be prepared using hybridoma methods, such as those 
described by Kohler and Milstein, Nature . 256:495 (1975). In a hybridoma method, a mouse, 
hamster, or other appropriate host animal, is typically immunized with an immunizing agent to 
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elicit lymphocytes that produce or are capable of producing antibodies that will specifically 
bind to the immunizing agent. Alternatively, the lymphocytes can be immunized in vitro. 

The immunizing agent will typically include the protein antigen, a fragment thereof or 
a fusion protein thereof. Generally, either peripheral blood lymphocytes are used if cells of 
human origin are desired, or spleen cells or lymph node cells are used if non-human 
mammalian sources are desired. The lymphocytes are then fused with an immortalized cell 
line using a suitable fusing agent, such as polyethylene glycol, to form a hybridoma cell 
[Goding, Monoclonal Antibodies: Principles and Practice, Academic Press, (1986) pp. 59- 
103]. Immortalized cell lines are usually transformed mammalian cells, particularly myeloma 
cells of rodent, bovine and human origin. Usually, rat or mouse myeloma cell lines are 
employed. The hybridoma cells can be cultured in a suitable culture medium that preferably 
contains one or more substances that inhibit the growth or survival of the unfiised, 
immortalized cells. For example, if the parental cells lack the enzyme hypoxanthine guanine 
phosphoribosyl transferase (HGPRT or HPRT), the culture medium for the hybridomas 
typically will include hypoxanthine, aminopterin, and thymidine ("HAT medium"), which 
substances prevent the growth of HGPRT-deficient cells. 

Preferred immortalized cell lines are those that fuse efficiently, support stable high level 
expression of antibody by the selected antibody-producing cells, and are sensitive to a medium 
such as HAT medium. More preferred immortalized cell lines are murine myeloma lines, 
which can be obtained, for instance, from the Salk Institute Cell Distribution Center, San 
Diego, California and the American Type Culture Collection, Manassas, Virginia. Human 
myeloma and mouse-human heteromyeloma cell lines also have been described for the 
production of human monoclonal antibodies [Kozbor, J. Immunol., 133:3001 (1984); Brodeur 
et al, Monoclonal Antibody Production Techniques and Applications , Marcel Dekker, Inc., 
New York, (1987) pp. 51-63]. 

The culture medium in which the hybridoma cells are cultured can then be assayed for 
the presence of monoclonal antibodies directed against the antigen. Preferably, the binding 
specificity of monoclonal antibodies produced by the hybridoma cells is determined by 
immunoprecipitation or by an in vitro binding assay, such as radioimmunoassay (RIA) or 
enzyme-linked immunoabsorbent assay (ELISA). Such techniques and assays are known in 
the art. The binding affinity of the monoclonal antibody can, for example, be determined by 
the Scatchard analysis of Munson and Pollard, Anal. Biochem. , 107:220 (1980). It is an 
objective, especially important in therapeutic applications of monoclonal antibodies, to 
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identify antibodies having a high degree of specificity and a high binding affinity for the target 
antigen. 

After the desired hybridoma cells are identified, the clones can be subcloned by 
limiting dilution procedures and grown by standard miethods (Goding,1986). Suitable culture 
media for this purpose include, for example, Dulbecco's Modified Eagle's Medium and RPMI- 
1640 medium. Alternatively, the hybridoma cells can be grown in vivo as ascites in a 
mammal. 

The .monoclonal antibodies secreted by the subclones can be isolated or purified from the 
culture medium or ascites fluid by conventional immunoglobulin purification procedures such 
as, for example, protein A-Sepharose, hydroxylapatite chromatography, gel electrophoresis, 
dialysis, or affinity chromatography. 

The monoclonal antibodies can also be made by recombinant DNA methods, such as 
those described in U.S. Patent No. 4,816,567. DNA encoding the monoclonal antibodies of 
the invention can be readily isolated and sequenced using conventional procedures (e.g., by 
using oligonucleotide probes that are capable of binding specifically to genes encoding the 
heavy and light chains of murine antibodies). The hybridoma cells of the invention serve as a 
preferred source of such DNA. Once isolated, the DNA can be placed into expression vectors, 
which are then transfected into host cells such as simian COS cells, Chinese hamster ovary 
(CHO) cells, or myeloma cells that do not otherwise produce immunoglobulin protein, to 
obtain the synthesis of monoclonal antibodies in the recombinant host cells. The DNA also 
can be modified, for example, by substituting the coding sequence for human heavy and light 
chain constant domains in place of the homologous murine sequences (U.S. Patent No. 
4,816,567; Morrison, Nature 368. 812-13 (1994)) or by covalently joining to the 
immunoglobulin coding sequence all or part of the coding sequence for a non-immunoglobulin 
polypeptide. Such a non-immunoglobulin polypeptide can be substituted for the constant 
domains of an antibody of the invention, or can be substituted for the variable domains of one 
antigen-combining site of an antibody of the invention to create a chimeric bivalent antibody. 

Humanized Antibodies 

The antibodies directed against the protein antigens of the invention can further 
comprise humanized antibodies or human antibodies. These antibodies are suitable for 
administration to humans without engendering an immune response by the human against the 
administered immunoglobulin. Humanized forms of antibodies are chimeric immunoglobulins, 
immunoglobulin chains or fragments thereof (such as Fv, Fab, Fab', F(ab')2 or other antigen- 
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binding subsequences of antibodies) that are principally comprised of the sequence of a human 
immunoglobulin, and contain minimal sequence derived from a non-human immunoglobulin. 
Humanization can be performed following the method of Winter and co-workers (Jones et al., 
Nature , 321:522-525 (1986); Riechmann et al., Nature . 332:323-327 (1988); Verhoeyen et al. } 
Science. 239:1534-1536 (1988)), by substituting rodent CDRs or CDR sequences for the 
corresponding sequences of a human antibody. (See also U.S. Patent No. 5,225,539.) In some 
instances, Fv framework residues of the human immunoglobulin are replaced by 
corresponding non-human residues. Humanized antibodies can also comprise residues which 
are found neither in the recipient antibody nor in the imported CDR or framework sequences. 
In general, the humanized antibody will comprise substantially all of at least one, and typically 
two, variable domains, in which all or substantially all of the CDR regions correspond to those 
of a non-human immunoglobulin and all or substantially all of the framework regions are 
those of a human immunoglobulin consensus sequence. The humanized antibody optimally 
also will comprise at least a portion of an immunoglobulin constant region (Fc), typically that 
of a human immunoglobulin (Jones et al, 1986; Riechmann et al., 1988; and Presta, Curr. Op. 
Struct. Biol. . 2:593-596 (1992)). 

Human Antibodies 

Fully human antibodies essentially relate to antibody molecules in which the entire 
sequence of both the light chain and the heavy chain, including the CDRs, arise from human 
genes. Such antibodies are termed "human antibodies", or "fully human antibodies" herein. 
Human monoclonal antibodies can be prepared by the trioma technique; the human B-cell 
hybridoma technique (see Kozbor, et al., 1983 Immunol Today 4: 72) and the EBV hybridoma 
technique to produce human monoclonal antibodies (see Cole, et al., 1985 In: Monoclonal 
Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). Human monoclonal 
antibodies may be utilized in the practice of the present invention and may be produced by 
using human hybridomas (see Cote, et al., 1983. Proc Natl Acad Sci USA 80: 2026-2030) or 
by transforming human B-cells with Epstein Barr Virus in vitro (see Cole, et al, 1985 In: 
Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). 

In addition, human antibodies can also be produced using additional techniques, 

including phage display libraries (Hoogenboom and Winter, J. Mol. BioL 227:381 (1991); 

Marks et al., J. Mol. Biol., 222:581 (1991)). Similarly, human antibodies can be made by 

introducing human immunoglobulin loci into transgenic animals, e.g., mice in which the 

endogenous immunoglobulin genes have been partially or completely inactivated. Upon 
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challenge, human antibody production is observed, which closely resembles that seen in 
humans in all respects, including gene rearrangement, assembly, and antibody repertoire. This 
approach is described, for example, in U.S. Patent Nos. 5,545,807; 5,545,806; 5,569,825; 
5,625,126; 5,633,425; 5,661,016, and in Marks et al. (Bio/Technology 10, 779-783 (1992)); 
Lonberg et al. ( Nature 368 856-859 (1 994)); Morrison ( Nature 368, 812-13 (1994)); Fishwild 
et al,( Nature Biotechnology 14, 845-51 (1996)); Neuberger fNature Biotechnology 14, 826 
(1996)); and Lonberg and Huszar ( Intern. Rev. Immunol. 13 65-93 (1995)). 

Human antibodies may additionally be produced using transgenic nonhuman animals 
which are modified so as to produce fully human antibodies rather than the animal's 
endogenous antibodies in response to challenge by an antigen. (See PCT publication 
WO94/02602). The endogenous genes encoding the heavy and light immunoglobulin chains in 
the nonhuman host have been incapacitated, and active loci encoding human heavy and light 
chain immunoglobulins are inserted into the host's genome. The human genes are 
incorporated, for example, using yeast artificial chromosomes containing the requisite human 
DNA segments. An animal which provides all the desired modifications is then obtained as 
progeny by crossbreeding intermediate transgenic animals containing fewer than the full 
complement of the modifications. The preferred embodiment of such a nonhuman animal is a 
mouse, and is termed the Xenomouse™ as disclosed in PCT publications WO 96/33735 and 
WO 96/34096. This animal produces B cells which secrete fully human immunoglobulins. 
The antibodies can be obtained directly from the animal after immunization with an 
immunogen of interest, as, for example, a preparation of a polyclonal antibody, or alternatively 
from immortalized B cells derived from the animal, such as hybridomas producing 
monoclonal antibodies. Additionally, the genes encoding the immunoglobulins with human 
variable regions can be recovered and expressed to obtain the antibodies directly, or can be 
further modified to obtain analogs of antibodies such as, for example, single chain Fv 
molecules. 

An example of a method of producing a nonhuman host, exemplified as a mouse, 
lacking expression of an endogenous immunoglobulin heavy chain is disclosed in U.S. Patent 
No. 5,939,598. It can be obtained by a method including deleting the J segment genes from at 
least one endogenous heavy chain locus in an embryonic stem cell to prevent rearrangement of 
the locus and to prevent formation of a transcript of a rearranged immunoglobulin heavy chain 
locus, the deletion being effected by a targeting vector containing a gene encoding a selectable 
marker; and producing from the embryonic stem cell a transgenic mouse whose somatic and 
germ cells contain the gene encoding the selectable marker. 
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A method for producing an antibody of interest, such as a human antibody, is disclosed 
in U.S. Patent No. 5,916,771. It includes introducing an expression vector that contains a 
nucleotide sequence encoding a heavy chain into one mammalian host cell in culture, 
introducing an expression vector containing a nucleotide sequence encoding a light chain into 
another mammalian host cell, and fusing the two cells to form a hybrid cell. The hybrid cell 
expresses an antibody containing the heavy chain and the light chain. 

In a further improvement on this procedure, a method for identifying a clinically 
relevant epitope on an immunogen, and a correlative method for selecting an antibody that 
binds immunospecifically to the relevant epitope with high affinity, are disclosed in PCT 
publication WO 99/53049. 

F flb Fragments and Single Chain Antibodies 

According to the invention, techniques can be adapted for the production of 
single-chain antibodies specific to an antigenic protein of the invention (see e.g., U.S. Patent 
No. 4,946,778). In addition, methods can be adapted for the construction of F a b expression 
libraries (see e.g., Huse, et al, 1989 Science 246: 1275-1281) to allow rapid and effective 
, identification of monoclonal Fab fragments with the desired specificity for a protein or 
derivatives, fragments, analogs or homologs thereof. Antibody fragments that contain the 
idiotypes to a protein antigen may be produced by techniques known in the art including, but 
not limited to: (i) an F( a b')2 fragment produced by pepsin digestion of an antibody molecule; (ii) 
an F a b fragment generated by reducing the disulfide bridges of an F( a b')2 fragment; (iii) an F a b 
fragment generated by the treatment of the antibody molecule with papain and a reducing 
agent and (iv) F v fragments. 

Bispeciflc Antibodies 

Bispecific antibodies are monoclonal, preferably human or humanized, antibodies that 
have binding specificities for at least two different antigens. In the present case, one of the 
binding specificities is for an antigenic protein of the invention. The second binding target is 
any other antigen, and advantageously is a cell-surface protein or receptor or receptor subunit. 
Methods for making bispecific antibodies are known in the art. Traditionally, the recombinant 
production of bispecific antibodies is based on the co-expression of two immunoglobulin 
heavy-chain/light-chain pairs, where the two heavy chains have different specificities 
(Milstein and Cuello, Nature. 305:537-539 (1983)). Because of the random assortment of 

48 



WO 02/072757 PCT/US02/06908 

immunoglobulin heavy and light chains, these hybridomas (quadromas) produce a potential 
mixture of ten different antibody molecules, of which only one has the correct bispecific 
structure. The purification of the correct molecule is usually accomplished by affinity 
chromatography steps. Similar procedures are disclosed in WO 93/08829, published 13 May 
1993, and in Traunecker et ah, EMBO J. . 10:3655-3659 (1991). 

Antibody variable domains with the desired binding specificities (antibody-antigen 
combining sites) can be fused to immunoglobulin constant domain sequences. The fusion 
preferably is with an immunoglobulin heavy-chain constant domain, comprising at least part 
of the hinge, CH2, and CH3 regions. It is preferred to have the first heavy-chain constant 
region (CHI) containing the site necessary for light-chain binding present in at least one of the 
fusions. DNAs encoding the immunoglobulin heavy-chain fusions and, if desired, the 
immunoglobulin light chain, are inserted into separate expression vectors, and are co- 
transfected into a suitable host organism. For further details of generating bispecific 
antibodies see, for example, Suresh et al., Methods in Enzymology, 121 :210 (1986). 

According to another approach described in WO 96/2701 1, the interface between a pair 
of antibody molecules can be engineered to maximize the percentage of heterodimers which 
are recovered from recombinant cell culture. The preferred interface comprises at least a part 
of the CH3 region of an antibody constant domain. In this method, one or more small amino 
acid side chains from the interface of the first antibody molecule are replaced with larger side 
chains (e.g. tyrosine or tryptophan). Compensatory "cavities" of identical or similar size to the 
large side chain(s) are created on the interface of the second antibody molecule by replacing 
large amino acid side chains with smaller ones (e.g. alanine or threonine). This provides a 
mechanism for increasing the yield of the heterodimer over other unwanted end-products such 
as homodimers. 

Bispecific antibodies can be prepared as full length antibodies or antibody fragments (e.g. 
F(ab')2 bispecific antibodies). Techniques for generating bispecific antibodies from antibody 
fragments have been described in the literature. For example, bispecific antibodies can be 
prepared using chemical linkage. Brennan et al, Science 229:81 (1985) describe a procedure 
wherein intact antibodies are proteolytically cleaved to generate F(ab')2 fragments. These 
fragments are reduced in the presence of the dithiol complexing agent sodium arsenite to 
stabilize vicinal dithiols and prevent intermolecular disulfide formation. The Fab' fragments 
generated are then converted to thionitrobenzoate (TNB) derivatives. One of the Fab'-TNB 
derivatives is then reconverted to the Fab'-thiol by reduction with mercaptoethylamine and is 
mixed with an equimolar amount of the other Fab'-TNB derivative to form the bispecific 
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antibody. The bispecific antibodies produced can be used as agents for the selective 
immobilization of enzymes. 

Additionally, Fab' fragments can be directly recovered from E. coli and chemically 
coupled to form bispecific antibodies. Shalaby et al,, J. Exp. Med. 175:217-225 (1992) 
describe the production of a fully humanized bispecific antibody F(ab')2 molecule. Each Fab' 
fragment was separately secreted from E. coli and subjected to directed chemical coupling in 
vitro to form the bispecific antibody. The bispecific antibody thus formed was able to bind to 
cells overexpressing the ErbB2 receptor and normal human T cells, as well as trigger the lytic 
activity of human cytotoxic lymphocytes against human breast tumor targets. 

Various techniques for making and isolating bispecific antibody fragments directly 
from recombinant cell culture have also been described. For example, bispecific antibodies 
have been produced using leucine zippers. Kostelny et al., J. Immunol. 148(5): 1547-1 553 
(1992). The leucine zipper peptides from the Fos and Jun proteins were linked to the Fab 5 
portions of two different antibodies by gene fusion. The antibody homodimers were reduced 
at the hinge region to form monomers and then re-oxidized to form the antibody heterodimers. 
This method can also be utilized for the production of antibody homodimers. The "diabody" 
technology described by Hollinger et al, Proc. Natl. Acad. Sci. USA 90:6444-6448 (1993) has 
provided an alternative mechanism for making bispecific antibody fragments. The fragments 
comprise a heavy-chain variable domain (Vh) connected to a light-chain variable domain (V L ) 
by a linker which is too short to allow pairing between the two domains on the same chain. 
Accordingly, the V H and V L domains of one fragment are forced to pair with the 
complementary V L and V H domains of another fragment, thereby forming two antigen-binding 
sites. Another strategy for making bispecific antibody fragments by the use of single-chain Fv 
(sFv) dimers has also been reported. See, Gruber et al., J. Immunol. 152:5368 (1994). 
Antibodies with more than two valencies are contemplated. For example, trispecific 
antibodies can be prepared. Tutt et al., J. Immunol. 147:60 (1991). 

Exemplary bispecific antibodies can bind to two different epitopes, at least one of 
which originates in the protein antigen of the invention. Alternatively, an anti-antigenic arm 
of an immunoglobulin molecule can be combined with an arm which binds to a triggering 
molecule on a leukocyte such as a T-cell receptor molecule (e.g. CD2, CD3, CD28, or B7), or 
Fc receptors for IgG (FcyR), such as FcyRI (CD64), FcyRII (CD32) and FcyRIII (CD 16) so as 
to focus cellular defense mechanisms to the cell expressing the particular antigen. Bispecific 
antibodies can also be used to direct cytotoxic agents to cells which express a particular 
antigen. These antibodies possess an antigen-binding arm and an arm which binds a cytotoxic 
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agent or a radionuclide chelator, such as EOTUBE, DPT A, DOT A, or TET A. Another 
bispecific antibody of interest binds the protein antigen described herein and further binds 
tissue factor (TF). 



Heteroconjugate Antibodies 

Heteroconjugate antibodies are also within the scope of the present invention. 
Heteroconjugate antibodies are composed of two covalently joined antibodies. Such 
antibodies have, for example, been proposed to target immune system cells to unwanted cells 
(U.S. Patent No. 4,676,980), and for treatment of HIV infection (WO 91/00360; WO 
92/200373; EP 03089). It is contemplated that the antibodies can be prepared in vitro using 
known methods in synthetic protein chemistry, including those involving crosslinking agents. 
For example, immunotoxins can be constructed using a disulfide exchange reaction or by 
forming a thioether bond. Examples of suitable reagents for this purpose include iminothiolate 
and methyl-4-mercaptobutyrimidate and those disclosed, for example, in U.S. Patent No. 
4,676,980. 

Effector Function Engineering 

It can be desirable to modify the antibody of the invention with respect to effector 
function, so as to enhance, e.g., the effectiveness of the antibody in treating cancer. For 
example, cysteine residue(s) can be introduced into the Fc region, thereby allowing interchain 
disulfide bond formation in this region. The homodimeric antibody thus generated can have 
improved internalization capability and/or increased complement-mediated cell killing and 
antibody-dependent cellular cytotoxicity (ADCC). See Caron et al., J. Exp Med ., 176 : 1191- 
1 195 (1992) and Shopes, J. Immunol .. 148 : 2918-2922 (1992). Homodimeric antibodies with 
enhanced anti-tumor activity can also be prepared using heterobifiinctional cross-linkers as 
described in Wolff et al. Cancer Research. 53: 2560-2565 (1993). Alternatively, an antibody 
can be engineered that has dual Fc regions and can thereby have enhanced complement lysis 
and ADCC capabilities. See Stevenson et al, Anti-Cancer Drug Design. 3: 219-230 (1989). 

Immunoconjugates 

The invention also pertains to immunoconjugates comprising an antibody conjugated 
to a cytotoxic agent such as a chemotherapeutic agent, toxin (e.g., an enzymatically active 
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toxin of bacterial, fungal, plant, or animal origin, or fragments thereof), or a radioactive 
isotope (i.e., a radioconjugate). 

.Chemotherapeutic agents useful in the generation of such immunoconjugates have 
been described above. Enzymatically active toxins and fragments thereof that can be used 
include diphtheria A chain, nonbinding active fragments of diphtheria toxin, exotoxin A chain 
(from Pseudomonas aeruginosa), ricin A chain, abrin A chain, modeccin A chain, alpha-sarcin, 
Aleurites fordii proteins, dianthin proteins, Phytolaca americana proteins (PAPI, PAPII, and 
PAP-S), momordica charantia inhibitor, curcin, crotin, sapaonaria officinalis inhibitor, 
gelonin, mitogellin, restrictocin, phenomycin, enomycin, and the tricothecenes. A variety of 
radionuclides are available for the production of radioconjugated antibodies. Examples 
include 2,2 Bi, 131 I, 13, In, 9() Y, and 186 Re. 

Conjugates of the antibody and cytotoxic agent are made using a variety of 
Afunctional protein-coupling agents such as N-succinimidyl-3-(2-pyridyldithiol) propionate 
(SPDP), iminothiolane (IT), Afunctional derivatives of imidoesters (such as dimethyl 
adipimidate HCL), active esters (such as disuccinimidyl suberate), aldehydes (such as 
glutareldehyde), bis-azido compounds (such as bis (p-azidobenzoyl) hexanediamine), bis- 
diazonium derivatives (such as bis-(p-diazoniumbenzoyl)-ethylenediamine), diisocyanates 
(such as tolyene 2,6-diisocyanate), and bis-active fluorine compounds (such as 1,5-difluoro- 
2,4-dinitrobenzene). For example, a ricin immunotoxin can be prepared as described in 
Vitetta et aL Science . 238 : 1098(1987). Carbon- 14-labeled l-isothiocyanatobenzyl-3- 
methyldiethylene triaminepentaacetic acid (MX-DTPA) is an exemplary chelating agent for 
conjugation of radionucleotide to the antibody. See W094/1 1026. 

In another embodiment, the antibody can be conjugated to a "receptor" (such 
streptavidin) for utilization in tumor pretargeting wherein the antibody-receptor conjugate is 
administered to the patient, followed by removal of unbound conjugate from the circulation 
using a clearing agent and then administration of a "ligand" (e.g., avidin) that is in turn 
conjugated to a cytotoxic agent. 

Immunoliposomes 

The antibodies disclosed herein can also be formulated as immunoliposomes. 
Liposomes containing the antibody are prepared by methods known in the art, such as 
described in Epstein et al., Proc. Natl. Acad. Sci. USA . 82: 3688 (1985); Hwang et al., Proc. 
Natl Acad. Sci. USA . 77: 4030 (1980); and U.S. Pat. Nos. 4,485,045 and 4,544,545. 
Liposomes with enhanced circulation time are disclosed in U.S. Patent No. 5,013,556. 
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Particularly useful liposomes can be generated by the reverse-phase evaporation 
method with a lipid composition comprising phosphatidylcholine, cholesterol, and PEG- 
derivatized phosphatidylethanolamine (PEG-PE). Liposomes are extruded through filters of 
defined pore size to yield liposomes with the desired diameter. Fab 1 fragments of the antibody 
of the present invention can be conjugated to the liposomes as described in Martin et al ., J. 
Biol. Chem., 257 : 286-288 (1982) via a disulfide-interchange reaction. A chemotherapeutic 
agent (such as Doxorubicin) is optionally contained within the liposome. See Gabizon et al, J. 
National Cancer Inst. , 81(19): 1484(1989). 

Diagnostic Applications of Antibodies Directed Against the Proteins of the Invention 

Antibodies directed against a protein of the invention may be used in methods known 
within the art relating to the localization and/or quantitation of the protein (e.g., for use in 
measuring levels of the protein within appropriate physiological samples, for use in diagnostic 
methods, for use in imaging the protein, and the like). In a given embodiment, antibodies 
against the proteins, or derivatives, fragments, analogs or homologs thereof, that contain the 
antigen binding domain, are utilized as pharmacologically-active compounds (see below). 

An antibody specific for a protein of the invention can be used to isolate the protein by 
standard techniques, such as immunoaffinity chromatography or immunoprecipitation. Such 
an antibody can facilitate the purification of the natural protein antigen from cells and of 
recombinantly produced antigen expressed in host cells. Moreover, such an antibody can be 
used to detect the antigenic protein (e.g., in a cellular lysate or cell supernatant) in order to 
evaluate the abundance and pattern of expression of the antigenic protein. Antibodies directed 
against the protein can be used diagnostically to monitor protein levels in tissue as part of a 
clinical testing procedure, e.g., to, for example, determine the efficacy of a given treatment 
regimen. Detection can be facilitated by coupling (i.e., physically linking) the antibody to a 
detectable substance. Examples of detectable substances include various enzymes, prosthetic 
groups, fluorescent materials, luminescent materials, bioluminescent materials, and radioactive 
materials. Examples of suitable enzymes include horseradish peroxidase, alkaline 
phosphatase, p-galactosidase, or acetylcholinesterase; examples of suitable prosthetic group 
complexes include streptavidin/biotin and avidin/biotin; examples of suitable fluorescent 
materials include umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, 
dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an example of a 
luminescent material includes luminol; examples of bioluminescent materials include 
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luciferase, luciferin, and aequorin, and examples of suitable radioactive material include 125 I, 
,31 L 35 Sor 3 H. 



Antibody Therapeutics 

Antibodies of the invention, including polyclonal, monoclonal, humanized and fully 
human antibodies, may used as therapeutic agents. Such agents will generally be employed to 
treat or prevent a disease or pathology in a subject. An antibody preparation, preferably one 
having high specificity and high affinity for its target antigen, is administered to the subject 
and will generally have an effect due to its binding with the target. Such an effect may be one 
of two kinds, depending on the specific nature of the interaction between the given antibody 
molecule and the target antigen in question. In the first instance, administration of the 
antibody may abrogate or inhibit the binding of the target with an endogenous ligand to which 
it naturally binds. In this case, the antibody binds to the target and masks a binding site of the 
naturally occurring ligand, wherein the ligand serves as an effector molecule. Thus the 
receptor mediates a signal transduction pathway for which ligand is responsible. 

Alternatively, the effect may be one in which the antibody elicits a physiological result 
by virtue of binding to an effector binding site on the target molecule. In this case the target, a 
receptor having an endogenous ligand which may be absent or defective in the disease or 
pathology, binds the antibody as a surrogate effector ligand, initiating a receptor-based signal 
transduction event by the receptor. 

A therapeutically effective amount of an antibody of the invention relates generally to 
the amount needed to achieve a therapeutic objective. As noted above, this may be a binding 
interaction between the antibody and its target antigen that, in certain cases, interferes with the 
functioning of the target, and in other cases, promotes a physiological response. The amount 
required to be administered will furthermore depend on the binding affinity of the antibody for 
its specific antigen, and will also depend on the rate at which an administered antibody is 
depleted from the free volume other subject to which it is administered. Common ranges for 
therapeutically effective dosing of an antibody or antibody fragment of the invention may be, 
by way of nonlimiting example, from about 0.1 mg/kg body weight to about 50 mg/kg body 
weight. Common dosing frequencies may range, for example, from twice daily to once a 
week. 



Pharmaceutical Compositions of Antibodies 
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Antibodies specifically binding a protein of the invention, as well as other molecules 
identified by the screening assays disclosed herein, can be administered for the treatment of 
various disorders in the form of pharmaceutical compositions. Principles and considerations 
involved in preparing such compositions, as well as guidance in the choice of components are 
provided, for example, in Remington : The Science And Practice Of Pharmacy 19th ed. 
(Alfonso R. Gennaro, et al., editors) Mack Pub. Co., Easton, Pa. : 1995; Drug Absorption 
Enhancement : Concepts, Possibilities, Limitations, And Trends, Harwood Academic 
Publishers, Langhorne, Pa., 1994; and Peptide And Protein Drug Delivery (Advances In 
Parenteral Sciences, Vol. 4), 1991, M. Dekker, New York. 

If the antigenic protein is intracellular and whole antibodies are used as inhibitors, 
internalizing antibodies are preferred. However, liposomes can also be used to deliver the 
antibody, or an antibody fragment, into cells. Where antibody fragments are used, the smallest 
inhibitory fragment that specifically binds to the binding domain of the target protein is 
preferred. For example, based upon the variable-region sequences of an antibody, peptide 
molecules can be designed that retain the ability to bind the target protein sequence. Such 
peptides can be synthesized chemically and/or produced by recombinant DNA technology. 
See, e.g., Marasco et aL, Proc. Natl. Acad. Sci. USA, 90: 7889-7893 (1993). The formulation 
herein can also contain more than one active compound as necessary for the particular 
indication being treated, preferably those with complementary activities that do not adversely 
affect each other. Alternatively, or in addition, the composition can comprise an agent that 
enhances its function, such as, for example, a cytotoxic agent, cytokine, chemotherapeutic 
agent, or growth-inhibitory agent. Such molecules are suitably present in combination in 
amounts that are effective for the purpose intended. 

The active ingredients can also be entrapped in microcapsules prepared, for example, 
by coacervation techniques or by interfacial polymerization, for example, 
hydroxymethylcellulose or gelatin-microcapsules and poly-(methylmethacrylate) 
microcapsules, respectively, in colloidal drug delivery systems (for example, liposomes, 
albumin microspheres, microemulsions, nano-particles, and nanocapsules) or in 
macroemulsions. 

The formulations to be used for in vivo administration must be sterile. This is readily 
accomplished by filtration through sterile filtration membranes. 

Sustained-release preparations can be prepared. Suitable examples of sustained-release 
preparations include semipermeable matrices of solid hydrophobic polymers containing the 
antibody, which matrices are in the form of shaped articles, e.g., films, or microcapsules. 
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Examples of sustained-release matrices include polyesters, hydrogels (for example, poly(2- 
hydroxyethyl-methacrylate), or poly(vinylalcohol)), polylactides (U.S. Pat. No. 3,773,919), 
copolymers of L-glutamic acid and y ethyl-L-glutamate, non-degradable ethylene- vinyl 
acetate, degradable lactic acid-glycolic acid copolymers such as the LUPRON DEPOT ™ 
(injectable microspheres composed of lactic acid-glycolic acid copolymer and leuprolide 
acetate), and poly-D-(-)-3-hydroxybutyric acid. While polymers such as ethylene-vinyl 
acetate and lactic acid-glycolic acid enable release of molecules for over 1 00 days, certain 
hydrogels release proteins for shorter time periods. 

ELISA Assay 

An agent for detecting an analyte protein is an antibody capable of binding to an 
analyte protein, preferably an antibody with a detectable label. Antibodies can be polyclonal, 
or more preferably, monoclonal. An intact antibody, or a fragment thereof (e.g., F a b or F( a b>2) 
can be used. The term "labeled", with regard to the probe or antibody, is intended to 
encompass direct labeling of the probe or antibody by coupling (i.e., physically linking) a 
detectable substance to the probe or antibody, as well as indirect labeling of the probe or 
antibody by reactivity with another reagent that is directly labeled. Examples of indirect 
labeling include detection of a primary antibody using a fluorescently-labeled secondary 
antibody and end-labeling of a DNA probe with biotin such that it can be detected with 
fluorescently-labeled streptavidin. The term "biological sample" is intended to include tissues, 
cells and biological fluids isolated from a subject, as well as tissues, cells and fluids present 
within a subject. Included within the usage of the term "biological sample", therefore, is 
blood and a fraction or component of blood including blood serum, blood plasma, or lymph. 
That is, the detection method of the invention can be used to detect an analyte mRNA, protein, 
or genomic DNA in a biological sample in vitro as well as in vivo. For example, in vitro 
techniques for detection of an analyte mRNA include Northern hybridizations and in situ 
hybridizations. In vitro techniques for detection of an analyte protein include enzyme linked 
immunosorbent assays (ELIS As), Western blots, immunoprecipitations, and 
immunofluorescence. In vitro techniques for detection of an analyte genomic DNA include 
Southern hybridizations. Procedures for conducting immunoassays are described, for example 
in "ELISA: Theory and Practice: Methods in Molecular Biology", Vol. 42, J. R. Crowther 
(Ed.) Human Press, Totowa, NJ, 1995; "Immunoassay", E. Diamandis and T. Christopoulus, 
Academic Press, Inc., San Diego, CA, 1996; and "Practice and Thory of Enzyme 
Immunoassays", P. Tijssen, Elsevier Science Publishers, Amsterdam, 1985. Furthermore, in 
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vivo techniques for detection of an analyte protein include introducing into a subject a labeled 
anti-an analyte protein antibody. For example, the antibody can be labeled with a radioactive 
marker whose presence and location in a subject can be detected by standard imaging 
techniques. 

NOVX Recombinant Expression Vectors and Host Cells 

Another aspect of the invention pertains to vectors, preferably expression vectors, 
containing a nucleic acid encoding an NOVX protein, or derivatives, fragments, analogs or 
homologs thereof. As used herein, the term "vector" refers to a nucleic acid molecule capable 
of transporting another nucleic acid to which it has been linked. One type of vector is a 
"plasmid", which refers to a circular double stranded DNA loop into which additional DNA 
segments can be ligated. Another type of vector is a viral vector, wherein additional DNA 
segments can be ligated into the viral genome. Certain vectors are capable of autonomous 
replication in a host cell into which they are introduced (e.g., bacterial vectors having a 
bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., 
non-episomal mammalian vectors) are integrated into the genome of a host cell upon 
introduction into the host cell, and thereby are replicated along with the host genome. 
Moreover, certain vectors are capable of directing the expression of genes to which they are 
operatively-linked. Such vectors are referred to herein as "expression vectors". In general, 
expression vectors of utility in recombinant DNA techniques are often in the form of plasmids. 
In the present specification, "plasmid" and "vector" can be used interchangeably as the 
plasmid is the most commonly used form of vector. However, the invention is intended to 
include such other forms of expression vectors, such as viral vectors (e.g., replication defective 
retroviruses, adenoviruses and adeno-associated viruses), which serve equivalent functions. 

The recombinant expression vectors of the invention comprise a nucleic acid of the 
invention in a form suitable for expression of the nucleic acid in a host cell, which means that 
the recombinant expression vectors include one or more regulatory sequences, selected on the 
basis of the host cells to be used for expression, that is operatively-linked to the nucleic acid 
sequence to be expressed. Within a recombinant expression vector, "operably-linked" is 
intended to mean that the nucleotide sequence of interest is linked to the regulatory 
sequence(s) in a manner that allows for expression of the nucleotide sequence (e.g., in an in 
vitro transcription/translation system or in a host cell when the vector is introduced into the 
host cell). 
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The term "regulatory sequence" is intended to includes promoters, enhancers and other 
expression control elements (e.g., polyadenylation signals). Such regulatory sequences are 
described, for example, in Goeddel, Gene Expression Technology: Methods in 
Enzymology 185, Academic Press, San Diego, Calif. (1990). Regulatory sequences include 
those that direct constitutive expression of a nucleotide sequence in many types of host cell 
and those that direct expression of the nucleotide sequence only in certain host cells (e.g., 
tissue-specific regulatory sequences). It will be appreciated by those skilled in the art that the 
design of the expression vector can depend on such factors as the choice of the host cell to be 
transformed, the level of expression of protein desired, etc. The expression vectors of the 
invention can be introduced into host cells to thereby produce proteins or peptides, including 
fusion proteins or peptides, encoded by nucleic acids as described herein (e.g., NOVX 
proteins, mutant forms of NOVX proteins, fusion proteins, etc.). 

The recombinant expression vectors of the invention can be designed for expression of 
NOVX proteins in prpkaryotic or eukaryotic cells. For example, NOVX proteins can be 
expressed in bacterial cells such as Escherichia coli, insect cells (using baculovirus expression 
vectors) yeast cells or mammalian cells. Suitable host cells are discussed further in Goeddel, 
Gene Expression Technology: Methods in Enzymology 185, Academic Press, San 
Diego, Calif (1990). Alternatively, the recombinant expression vector can be transcribed and 
translated in vitro, for example using T7 promoter regulatory sequences and T7 polymerase. 
Expression of proteins in prokaryotes is most often carried out in Escherichia coli with vectors 
containing constitutive or inducible promoters directing the expression of either fusion or 
non- fusion proteins. Fusion vectors add a number of amino acids to a protein encoded therein, 
usually to the amino terminus of the recombinant protein. Such fusion vectors typically serve 
three purposes: (/) to increase expression of recombinant protein; (ii) to increase the solubility 
of the recombinant protein; and (iii) to aid in the purification of the recombinant protein by 
acting as a ligand in affinity purification. Often, in fusion expression vectors, a proteolytic 
cleavage site is introduced at the junction of the fusion moiety and the recombinant protein to 
enable separation of the recombinant protein from the fusion moiety subsequent to purification 
of the fusion protein. Such enzymes, and their cognate recognition sequences, include Factor 
Xa, thrombin and enterokinase. Typical fusion expression vectors include pGEX (Pharmacia 
Biotech Inc; Smith and Johnson, 1988. Gene 67: 31-40), pMAL (New England Biolabs, 
Beverly, Mass.) and pRIT5 (Pharmacia, Piscataway, N J.) that fuse glutathione S-transferase 
(GST), maltose E binding protein, or protein A, respectively, to the target recombinant protein. 
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Examples of suitable inducible non-fusion E. coli expression vectors include pTrc (Amrann et 
al, (1988) Gene 69:301-315) and pET 1 Id (Studier et al, Gene Expression Technology: 
Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990) 60-89). 

One strategy to maximize recombinant protein expression in E. coli is to express the 
protein in a host bacteria with an impaired capacity to proteolytically cleave the recombinant 
protein. See, e.g., Gottesman, Gene Expression Technology: Methods in Enzymology 
185, Academic Press, San Diego, Calif. (1990) 1 19-128. Another strategy is to alter the 
nucleic acid sequence of the nucleic acid to be inserted into an expression vector so that the 
individual codons for each amino acid are those preferentially utilized in E. coli (see, e.g., 
Wada, et al, 1992. Nucl Acids Res. 20: 21 1 1-2118). Such alteration of nucleic acid 
sequences of the invention can be carried out by standard DNA synthesis techniques. 

In another embodiment, the NOVX expression vector is a yeast expression vector. 
Examples of vectors for expression in yeast Saccharomyces cerivisae include pYepSecl 
(Baldari, et al, 1987. EMBO J. 6: 229-234), pMFa (Kurjan and Herskowitz, 1982. Cell 30: 
933-943), pJRY88 (Schultz et al, 1987. Gene 54: 1 13-123), pYES2 (Invitrogen Corporation, 
San Diego, Calif.), and picZ (InVitrogen Corp, San Diego, Calif). 

Alternatively, NOVX can be expressed in insect cells using baculovirus expression vectors. 
Baculovirus vectors available for expression of proteins in cultured insect cells (e.g., SF9 
cells) include the pAc series (Smith, et al, 1983. Mol Cell Biol. 3: 2156-2165) and the pVL 
series (Lucklow and Summers, 1989. Virology 170: 31-39). 

In yet another embodiment, a nucleic acid of the invention is expressed in mammalian 
cells using a mammalian expression vector. Examples of mammalian expression vectors 
include pCDM8 (Seed, 1987. Nature 329: 840) and pMT2PC (Kaufinan, et al, 1987. EMBO 
J. 6: 187-195). When used in mammalian cells, the expression vector's control functions are 
often provided by viral regulatory elements. For example, commonly used promoters are 
derived from polyoma, adenovirus 2, cytomegalovirus, and simian virus 40. For other suitable 
expression systems for both prokaryotic and eukaryotic cells see, e.g., Chapters 16 and 17 of 
Sambrook, et al, Molecular Cloning: A Laboratory Manual. 2nd ed., Cold Spring 
Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989. 

In another embodiment, the recombinant mammalian expression vector is capable of 
directing expression of the nucleic acid preferentially in a particular cell type (e.g., 
tissue-specific regulatory elements are used to express the nucleic acid). Tissue-specific 
regulatory elements are known in the art. Non-limiting examples of suitable tissue-specific 
promoters include the albumin promoter (liver-specific; Pinkert, et al, 1987. Genes Dev. 1: 
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268-277), lymphoid-specific promoters (Calame and Eaton, 1988. Adv. Immunol 43: 
235-275), in particular promoters of T cell receptors (Winoto and Baltimore, 1989. EMBOJ. 
8: 729-733) and immunoglobulins (Banerji, et al. 9 1983. Cell 33: 729-740; Queen and 
Baltimore, 1983. Cell 33: 741-748), neuron-specific promoters (e.g., the neurofilament 
promoter; Byrne and Ruddle, 1989. Proc. Natl. Acad. Sci. USA 86: 5473-5477), 
pancreas-specific promoters (Edlund, et al, 1985. Science 230: 912-916), and mammary 
gland-specific promoters {e.g., milk whey promoter; U.S. Pat. No. 4,873,3 16 and European 
Application Publication No. 264,166). Developmentally-regulated promoters are also 
encompassed, e.g., the murine hox promoters (Kessel and Gruss, 1990. Science 249: 374-379) 
and the ot-fetoprotein promoter (Campes and Tilghman, 1989. Genes Dev. 3: 537-546). 

The invention further provides a recombinant expression vector comprising a DNA 
molecule of the invention cloned into the expression vector in an antisense orientation. That 
is, the DNA molecule is operatively-linked to a regulatory sequence in a manner that allows 
for expression (by transcription of the DNA molecule) of an RNA molecule that is antisense to 
NOVX mRNA. Regulatory sequences operatively linked to a nucleic acid cloned in the 
antisense orientation can be chosen that direct the continuous expression of the antisense RNA 
molecule in a variety of cell types, for instance viral promoters and/or enhancers, or regulatory 
sequences can be chosen that direct constitutive, tissue specific or cell type specific expression 
of antisense RNA. The antisense expression vector can be in the form of a recombinant 
plasmid, phagemid or attenuated virus in which antisense nucleic acids are produced under the 
control of a high efficiency regulatory region, the activity of which can be determined by the 
cell type into which the vector is introduced. For a discussion of the regulation of gene 
expression using antisense genes see, e.g., Weintraub, et ah, "Antisense RNA as a molecular 
tool for genetic analysis," Reviews-Trends in Genetics, Vol. 1(1) 1986. 

Another aspect of the invention pertains to host cells into which a recombinant 
expression vector of the invention has been introduced. The terms "host cell" and 
"recombinant host cell" are used interchangeably herein. It is understood that such terms refer 
not only to the particular subject cell but also to the progeny or potential progeny of such a 
cell. Because certain modifications may occur in succeeding generations due to either 
mutation or environmental influences, such progeny may not, in fact, be identical to the parent 
cell, but are still included within the scope of the term as used herein. 

A host cell can be any prokaryotic or eukaryotic cell. For example, NOVX protein can 
be expressed in bacterial cells such as E. coli, insect cells, yeast or mammalian cells (such as 
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Chinese hamster ovary cells (CHO) or COS cells). Other suitable host cells are known to 
those skilled in the art. 

Vector DNA can be introduced into prokaryotic or eukaryotic cells via conventional 
transformation or transfection techniques. As used herein, the terms "transformation" and 
"transfection" are intended to refer to a variety of art-recognized techniques for introducing 
foreign nucleic acid (e.g., DNA) into a host cell, including calcium phosphate or calcium 
chloride co-precipitation, DEAE-dextran-mediated transfection, lipofection, or 
electroporation. Suitable methods for transforming or transfecting host cells can be found in 
Sambrook, et al (Molecular Cloning: A Laboratory Manual. 2nd ed., Cold Spring 
Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989), 
and other laboratory manuals. 

For stable transfection of mammalian cells, it is known that, depending upon the 
expression vector and transfection technique used, only a small fraction of cells may integrate 
the foreign DNA into their genome. In order to identify and select these integrants, a gene that 
encodes a selectable marker {e.g., resistance to antibiotics) is generally introduced into the 
host cells along with the gene of interest. Various selectable markers include those that confer 
resistance to drugs, such as G418, hygromycin and methotrexate. Nucleic acid encoding a 
selectable marker can be introduced into a host cell on the same vector as that encoding 
NOVX or can be introduced on a separate vector. Cells stably transfected with the introduced 
nucleic acid can be identified by drug selection {e.g., cells that have incorporated the 
selectable marker gene will survive, while the other cells die). 

A host cell of the invention, such as a prokaryotic or eukaryotic host cell in culture, can 
be used to produce {i.e., express) NOVX protein. Accordingly, the invention further provides 
methods for producing NOVX protein using the host cells of the invention. In one 
embodiment, the method comprises culturing the host cell of invention (into which a 
recombinant expression vector encoding NOVX protein has been introduced) in a suitable 
medium such that NOVX protein is produced. In another embodiment, the method further 
comprises isolating NOVX protein from the medium or the host cell. 

Transgenic NOVX Animals 

The host cells of the invention can also be used to produce non-human transgenic 

animals. For example, in one embodiment, a host cell of the invention is a fertilized oocyte or 

an embryonic stem cell into which NOVX protein-coding sequences have been introduced. 

Such host cells can then be used to create non-human transgenic animals in which exogenous 
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NOVX sequences have been introduced into their genome or homologous recombinant 
animals in which endogenous NOVX sequences have been altered. Such animals are useful 
for studying the function and/or activity of NOVX protein and for identifying and/or 
evaluating modulators of NOVX protein activity. As used herein, a "transgenic animal" is a 
non-human animal, preferably a mammal, more preferably a rodent such as a rat or mouse, in 
which one or more of the cells of the animal includes a transgene. Other examples of 
transgenic animals include non-human primates, sheep, dogs, cows, goats, chickens, 
amphibians, etc. A transgene is exogenous DNA that is integrated into the genome of a cell 
from which a transgenic animal develops and that remains in the genome of the mature 
animal, thereby directing the expression of an encoded gene product in one or more cell types 
or tissues of the transgenic animal. As used herein, a "homologous recombinant animal" is a 
non-human animal, preferably a mammal, more preferably a mouse, in which an endogenous 
NOVX gene has been altered by homologous recombination between the endogenous gene 
and an exogenous DNA molecule introduced into a cell of the animal, e.g., an embryonic cell 
of the animal, prior to development of the animal. 

A transgenic animal of the invention can be created by introducing NOVX-encoding 
nucleic acid into the male pronuclei of a fertilized oocyte {e.g., by microinjection, retroviral 
infection) and allowing the oocyte to develop in a pseudopregnant female foster animal. The 
human NOVX cDNA sequences SEQ ID NO: 2n-l, wherein n is an integer between 1 and 178 
can be introduced as a transgene into the genome of a non-human animal. Alternatively, a 
non-human homologue of the human NOVX gene, such as a mouse NOVX gene, can be 
isolated based on hybridization to the human NOVX cDNA (described further supra) and used 
as a transgene. Intronic sequences and polyadenylation signals can also be included in the 
transgene to increase the efficiency of expression of the transgene. A tissue-specific 
regulatory sequence(s) can be operably-linked to the NOVX transgene to direct expression of 
NOVX protein to particular cells. Methods for generating transgenic animals via embryo 
manipulation and microinjection, particularly animals such as mice, have become 
conventional in the art and are described, for example, in U.S. Patent Nos. 4,736,866; 
4,870,009; and 4,873,191; and Hogan, 1986. In: Manipulating the Mouse Embryo, Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. Similar methods are used for 
production of other transgenic animals. A transgenic founder animal can be identified based 
upon the presence of the NOVX transgene in its genome and/or expression of NOVX mRNA 
in tissues or cells of the animals. A transgenic founder animal can then be used to breed 
additional animals carrying the transgene. Moreover, transgenic animals carrying a transgene- 
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encoding NO VX protein. can further be bred to other transgenic animals carrying other 
transgenes. 

To create a homologous recombinant animal, a vector is prepared which contains at 
least a portion of an NO VX gene into which a deletion, addition or substitution has been 
introduced to thereby alter, e.g., functionally disrupt, the NOVX gene. The NOVX gene can 
be a human gene (e.g., the cDNA of SEQ ID NO: 2n-l, wherein n is an integer between 1 and 
178), but more preferably, is a non-human homologue of a human NOVX gene. For example, 
a mouse homologue of human NOVX gene of SEQ ID NO: 2n-l, wherein n is an integer 
between 1 and 178 can be used to construct a homologous recombination vector suitable for 
altering an endogenous NOVX gene in the mouse genome. In one embodiment, the vector is 
designed such that, upon homologous recombination, the endogenous NOVX gene is 
functionally disrupted (i.e., no longer encodes a functional protein; also referred to as a "knock 
out" vector). 

Alternatively, the vector can be designed such that, upon homologous recombination, 
the endogenous NOVX gene is mutated or otherwise altered but still encodes functional 
protein (eg., the upstream regulatory region can be altered to thereby alter the expression of 
the endogenous NOVX protein). In the homologous recombination vector, the altered portion 
of the NOVX gene is flanked at its 5*- and S'-termini by additional nucleic acid of the NOVX 
gene to allow for homologous recombination to occur between the exogenous NOVX gene 
carried by the vector and an endogenous NOVX gene in an embryonic stem cell. The 
additional flanking NOVX nucleic acid is of sufficient length for successful homologous 
recombination with the endogenous gene. Typically, several kilobases of flanking DNA (both 
at the 5 - and 3 -termini) are included in the vector. See, e.g., Thomas, et al, 1987. Cell 51 : 
503 for a description of homologous recombination vectors. The vector is ten introduced into 
an embryonic stem cell line {e.g., by electroporation) and cells in which the introduced NOVX 
gene has homologously-recombined with the endogenous NOVX gene are selected. See, e.g., 
Li, et al, 1992. Cell 69: 915. 

The selected cells are then injected into a blastocyst of an animal (e.g., a mouse) to 
form aggregation chimeras. See, e.g., Bradley, 1987. In: Teratocarcinomas and 
Embryonic Stem Cells: A Practical Approach, Robertson, ed. IRL, Oxford, pp. 1 13-152. 
A chimeric embryo can then be implanted into a suitable pseudopregnant female foster animal 
and the embryo brought to term. Progeny harboring the homologously-recombined DNA in 
their germ cells can be used to breed animals in which all cells of the animal contain the 
homologously-recombined DNA by germline transmission of the transgene. Methods for 
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constructing homologous recombination vectors and homologous recombinant animals are 
described further in Bradley, 1991. Curr. Opin. Biotechnol 2: 823-829; PCT International 
Publication Nos.: WO 90/1 1354; WO 91/01 140; WO 92/0968; and WO 93/04169. 

In another embodiment, transgenic non-humans animals can be produced that contain 
selected systems that allow for regulated expression of the transgene. One example of such a 
system is the cre/loxP recombinase system of bacteriophage PI. For a description of the 
cre/loxP recombinase system, See, e.g., Lakso, et al, 1992. Proc. Natl Acad. Sci. USA 89: 
6232-6236. Another example of a recombinase system is the FLP recombinase system of 
Saccharomyces cerevisiae. See, O'Gorman, et al, 1991. Science 251:1351-1355. Ifacre/loxP 
recombinase system is used to regulate expression of the transgene, animals containing 
transgenes encoding both the Cre recombinase and a selected protein are required. Such 
animals can be provided through the construction of "double" transgenic animals, e.g., by 
mating two transgenic animals, one containing a transgene encoding a selected protein and the 
other containing a transgene encoding a recombinase. 

Clones of the non-human transgenic animals described herein can also be produced 
according to the methods described in Wilmut, et al y 1997. Nature 385: 810-813. In brief, a 
cell (e.g., a somatic cell) from the transgenic animal can be isolated and induced to exit the 
growth cycle and enter Go phase. The quiescent cell can then be fused, e.g., through the use of 
electrical pulses, to an enucleated oocyte from an animal of the same species from which the 
quiescent cell is isolated. The reconstructed oocyte is then cultured such that it develops to 
morula or blastocyte and then transferred to pseudopregnant female foster animal. The 
offspring borne of this female foster animal will be a clone of the animal from which the cell 
(e.g., the somatic cell) is isolated. 

Pharmaceutical Compositions 

The NOVX nucleic acid molecules, NOVX proteins, and anti-NOVX antibodies (also 

referred to herein as "active compounds") of the invention, and derivatives, fragments, analogs 

and homologs thereof, can be incorporated into pharmaceutical compositions suitable for 

administration. Such compositions typically comprise the nucleic acid molecule, protein, or 

antibody and a pharmaceutical^ acceptable carrier. As used herein, "pharmaceutically 

acceptable carrier" is intended to include any and all solvents, dispersion media, coatings, 

antibacterial and antifungal agents, isotonic and absorption delaying agents, and the like, 

compatible with pharmaceutical administration. Suitable carriers are described in the most 

recent edition of Remington's Pharmaceutical Sciences, a standard reference text in the field, 
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which is incorporated herein by reference. Preferred examples of such carriers or diluents 
include, but are not limited to, water, saline, finger's solutions, dextrose solution, and 5% 
human serum albumin. Liposomes and non-aqueous vehicles such as fixed oils may also be 
used. The use of such media and agents for pharmaceutical^ active substances is well known 
in the art. Except insofar as any conventional media or agent is incompatible with the active 
compound, use thereof in the compositions is contemplated. Supplementary active 
compounds can also be incorporated into the compositions. 

A pharmaceutical composition of the invention is formulated to be compatible with its 
intended route of administration. Examples of routes of administration include parenteral, 
e.g., intravenous, intradermal, subcutaneous, oral (e.g., inhalation), transdermal (i.e., topical), 
transmucosal, and rectal administration. Solutions or suspensions used for parenteral, 
intradermal, or subcutaneous application can include the following components: a sterile 
diluent such as water for injection, saline solution, fixed oils, polyethylene glycols, glycerine, 
propylene glycol or other synthetic solvents; antibacterial agents such as benzyl alcohol or 
methyl parabens; antioxidants such as ascorbic acid or sodium bisulfite; chelating agents such 
as ethyl enediaminetetraacetic acid (EDTA); buffers such as acetates, citrates or phosphates, 
and agents for the adjustment of tonicity such as sodium chloride or dextrose. The pH can be 
adjusted with acids or bases, such as hydrochloric acid or sodium hydroxide. The parenteral 
preparation can be enclosed in ampoules, disposable syringes or multiple dose vials made of 
glass or plastic. 

Pharmaceutical compositions suitable for injectable use include sterile aqueous 
solutions (where water soluble) or dispersions and sterile powders for the extemporaneous 
preparation of sterile injectable solutions or dispersion. For intravenous administration, 
suitable carriers include physiological saline, bacteriostatic water, Cremophor EL™ (BASF, 
Parsippany, NJ.) or phosphate buffered saline (PBS). In all cases, the composition must be 
sterile and should be fluid to the extent that easy syringeability exists. It must be stable under 
the conditions of manufacture and storage and must be preserved against the contaminating 
action of microorganisms such as bacteria and fungi. The carrier can be a solvent or 
dispersion medium containing, for example, water, ethanol, polyol (for example, glycerol, 
propylene glycol, and liquid polyethylene glycol, and the like), and suitable mixtures thereof. 
The proper fluidity can be maintained, for example, by the use of a coating such as lecithin, by 
the maintenance of the required particle size in the case of dispersion and by the use of 
surfactants. Prevention of the action of microorganisms can be achieved by various 
antibacterial and antifungal agents, for example, parabens, chlorobutanol, phenol, ascorbic 
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acid, thimerosal, and the like. In many cases, it will be preferable to include isotonic agents, 
for example, sugars, polyalcohols such as manitol, sorbitol, sodium chloride in the 
composition. Prolonged absorption of the injectable compositions can be brought about by 
including in the composition an agent which delays absorption, for example, aluminum 
monostearate and gelatin. 

Sterile injectable solutions can be prepared by incorporating the active compound (e.g., 
an NOVX protein or anti-NOVX antibody) in the required amount in an appropriate solvent 
with one or a combination of ingredients enumerated above, as required, followed by filtered 
sterilization. Generally, dispersions are prepared by incorporating the active compound into a 
sterile vehicle that contains a basic dispersion medium and the required other ingredients from 
those enumerated above. In the case of sterile powders for the preparation of sterile injectable 
solutions, methods of preparation are vacuum drying and freeze-drying that yields a powder of 
the active ingredient plus any additional desired ingredient from a previously sterile-filtered 
solution thereof. 

Oral compositions generally include an inert diluent or an edible carrier. They can be 
enclosed in gelatin capsules or compressed into tablets. For the purpose of oral therapeutic 
administration, the active compound can be incorporated with excipients and used in the form 
of tablets, troches, or capsules. Oral compositions can also be prepared using a fluid carrier 
for use as a mouthwash, wherein the compound in the fluid carrier is applied orally and 
swished and expectorated or swallowed. Pharmaceutically compatible binding agents, and/or 
adjuvant materials can be included as part of the composition. The tablets, pills, capsules, 
troches and the like can contain any of the following ingredients, or compounds of a similar 
nature: a binder such as microcrystalline cellulose, gum tragacanth or gelatin; an excipient 
such as starch or lactose, a disintegrating agent such as alginic acid, Primogel, or com starch; a 
lubricant such as magnesium stearate or Sterotes; a glidant such as colloidal silicon dioxide; a 
sweetening agent such as sucrose or saccharin; or a flavoring agent such as peppermint, 
methyl salicylate, or orange flavoring. 

For administration by inhalation, the compounds are delivered in the form of an 
aerosol spray from pressured container or dispenser which contains a suitable propellant, e.g., 
a gas such as carbon dioxide, or a nebulizer. 

Systemic administration can also be by transmucosal or transdermal means. For 
transmucosal or transdermal administration, penetrants appropriate to the barrier to be 
permeated are used in the formulation. Such penetrants are generally known in the art, and 
include, for example, for transmucosal administration, detergents, bile salts, and fiisidic acid 
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derivatives. Transmucosal administration can be accomplished through the use of nasal sprays 
or suppositories. For transdermal administration, the active compounds are formulated into 
ointments, salves, gels, or creams as generally known in the art. 

The compounds can also be prepared in the form of suppositories (e.g., with 
conventional suppository bases such as cocoa butter and other glycerides) or retention enemas 
for rectal delivery. 

In one embodiment, the active compounds are prepared with carriers that will protect 
the compound against rapid elimination from the body, such as a controlled release 
formulation, including implants and microencapsulated delivery systems. Biodegradable, 
biocompatible polymers can be used, such as ethylene vinyl acetate, polyanhydrides, 
polyglycolic acid, collagen, polyorthoesters, and polylactic acid. Methods for preparation of 
such formulations will be apparent to those skilled in the art. The materials can also be 
obtained commercially from Alza Corporation and Nova Pharmaceuticals, Inc. Liposomal 
suspensions (including liposomes targeted to infected cells with monoclonal antibodies to viral 
antigens) can also be used as pharmaceutically acceptable carriers. These can be prepared 
according to methods known to those skilled in the art, for example, as described in U.S. 
Patent No. 4,522,811. 

It is especially advantageous to formulate oral or parenteral compositions in dosage 
unit form for ease of administration and uniformity of dosage. Dosage unit form as used 
herein refers to physically discrete units suited as unitary dosages for the subject to be treated; 
each unit containing a predetermined quantity of active compound calculated to produce the 
desired therapeutic effect in association with the required pharmaceutical carrier. The 
specification for the dosage unit forms of the invention are dictated by and directly dependent 
on the unique characteristics of the active compound and the particular therapeutic effect to be 
achieved, and the limitations inherent in the art of compounding such an active compound for 
the treatment of individuals. 

The nucleic acid molecules of the invention can be inserted into vectors and used as 
gene therapy vectors. Gene therapy vectors can be delivered to a subject by, for example, 
intravenous injection, local administration (see, e.g., U.S. Patent No. 5,328,470) or by 
stereotactic injection (see, e.g., Chen, et al. y 1994. Proa Natl. Acad Sci. USA 91: 3054-3057). 
The pharmaceutical preparation of the gene therapy vector can include the gene therapy vector 
in an acceptable diluent, or can comprise a slow release matrix in which the gene delivery 
vehicle is imbedded. Alternatively, where the complete gene delivery vector can be produced 
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intact from recombinant cells, e.g., retroviral vectors, the pharmaceutical preparation can 
include one or more cells that produce the gene delivery system. 

The pharmaceutical compositions can be included in a container, pack, or dispenser 
together with instructions for administration. 

Screening and Detection Methods 

The isolated nucleic acid molecules of the invention can be used to express NOVX 
protein (e.g., via a recombinant expression vector in a host cell in gene therapy applications), 
to detect NOVX mRNA (e.g., in a biological sample) or a genetic lesion in an NOVX gene, 
and to modulate NOVX activity, as described further, below. In addition, the NOVX proteins 
can be used to screen drugs or compounds that modulate the NOVX protein activity or 
expression as well as to treat disorders characterized by insufficient or excessive production of 
NOVX protein or production of NOVX protein forms that have decreased or aberrant activity 
compared to NOVX wild-type protein (e.g.; diabetes (regulates insulin release); obesity (binds 
and transport lipids); metabolic disturbances associated with obesity, the metabolic syndrome 
X as well as anorexia and wasting disorders associated with chronic diseases and various 
cancers, and infectious disease(possesses anti-microbial activity) and the various 
dyslipidemias. In addition, the anti-NOVX antibodies of the invention can be used to detect 
and isolate NOVX proteins and modulate NOVX activity. In yet a further aspect, the invention 
can be used in methods to influence appetite, absorption of nutrients and the disposition of 
metabolic substrates in both a positive and negative fashion. 

The invention further pertains to novel agents identified by the screening assays 
described herein and uses thereof for treatments as described, supra. 

Screening Assays 

The invention provides a method (also referred to herein as a "screening assay") for 
identifying modulators, i.e., candidate or test compounds or agents (e.g., peptides, 
peptidomimetics, small molecules or other drugs) that bind to NOVX proteins or have a 
stimulatory or inhibitory effect on, e.g., NOVX protein expression or NOVX protein activity. 
The invention also includes compounds identified in the screening assays described herein. 
In one embodiment, the invention provides assays for screening candidate or test compounds 
which bind to or modulate the activity of the membrane-bound form of an NOVX protein or 
polypeptide or biologically-active portion thereof. The test compounds of the invention can be 
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obtained using any of the numerous approaches in combinatorial library methods known in the 
art, including: biological libraries; spatially addressable parallel solid phase or solution phase 
libraries; synthetic library methods requiring deconvolution; the "one-bead one-compound" 
library method; and synthetic library methods using affinity chromatography selection. The 
biological library approach is limited to peptide libraries, while the other four approaches are 
applicable to peptide, non-peptide oligomer or small molecule libraries of compounds. See, 
e.g., Lam, 1997 Anticancer Drug Design 12: 145. 

A "small molecule" as used herein, is meant to refer to a composition that has a 
molecular weight of less than about 5 kD and most preferably less than about 4 kD. Small 
molecules can be, e.g., nucleic acids, peptides, polypeptides, peptidomimetics, carbohydrates, 
lipids or other organic or inorganic molecules. Libraries of chemical and/or biological 
mixtures, such as fungal, bacterial, or algal extracts, are known in the art and can be screened 
with any of the assays of the invention. 

Examples of methods for the synthesis of molecular libraries can be found in the art, 
for example in: DeWitt, et aL, 1993. Proa Natl. Acad. ScL U.S.A. 90: 6909; Erb, et aL, 1994. 
Proa Natl. Acad. Sci. U.S.A. 91: 11422; Zuckermann, et aL, 1994. J. Med. Chem. 37: 2678; 
Oho, et aL, 1993. Science 261 : 1303; Carrell, et aL, 1994. Angew. Chem. Int. Ed Engl. 33: 
2059; Carell, et aL, 1994. Angew. Chem. Int. Ed. Engl. 33: 2061; and Gallop, et aL, 1994. J. 
Med. Chem. 37: 1233. 

Libraries of compounds may be presented in solution {e.g., Houghten, 1992. 
Biotechniques 13: 412-421), or on beads (Lam, 1991. Nature 354: 82-84), on chips (Fodor, 
1993. Nature 364: 555-556), bacteria (Ladner, U.S. Patent No. 5,223,409), spores (Ladner, 
U.S. Patent 5,233,409), plasmids (Cull, et aL, 1992. Proa Natl. Acad. ScL USA 89: 
1865-1869) or on phage (Scott and Smith, 1990. Science 249: 386-390; Devlin, 1990. Science 
249: 404-406; Cwirla, et aL, 1990. Proa Natl. Acad. Sci. U.S.A. 87: 6378-6382; Felici, 1991. 
J. Mol Biol. 222: 301-310; Ladner, U.S. Patent No. 5,233,409.). 

In one embodiment, an assay is a cell-based assay in which a cell which expresses a 
membrane-bound form of NOVX protein, or a biologically-active portion thereof, on the cell 
surface is contacted with a test compound and the ability of the test compound to bind to an 
NOVX protein determined. The cell, for example, can of mammalian origin or a yeast cell. 
Determining the ability of the test compound to bind to the NOVX protein can be 
accomplished, for example, by coupling the test compound with a radioisotope or enzymatic 
label such that binding of the test compound to the NOVX protein or biologically-active 
portion thereof can be determined by detecting the labeled compound in a complex. For 
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example, test compounds can be labeled with 125 1, 35 S, 14 C, or 3 H, either directly or indirectly, 
and the radioisotope detected by direct counting of radioemission or by scintillation counting. 
Alternatively, test compounds can be enzymatically-labeled with, for example, horseradish 
peroxidase, alkaline phosphatase, or luciferase, and the enzymatic label detected by 
determination of conversion of an appropriate substrate to product. In one embodiment, the 
assay comprises contacting a cell which expresses a membrane-bound form of NOVX protein, 
or a biologically-active portion thereof, on the cell surface with a known compound which 
binds NOVX to form an assay mixture, contacting the assay mixture with a test compound, 
and determining the ability of the test compound to interact with an NOVX protein, wherein 
determining the ability of the test compound to interact with an NOVX protein comprises 
determining the ability of the test compound to preferentially bind to NOVX protein or a 
biologically-active portion thereof as compared to the known compound. 

In another embodiment, an assay is a cell-based assay comprising contacting a cell 
expressing a membrane-bound form of NOVX protein, or a biologically-active portion thereof, 
on the cell surface with a test compound and determining the ability of the test compound to 
modulate (e.g., stimulate or inhibit) the activity of the NOVX protein or biologically-active 
portion thereof. Determining the ability of the test compound to modulate the activity of 
NOVX or a biologically-active portion thereof can be accomplished, for example, by 
determining the ability of the NOVX protein to bind to or interact with an NOVX target 
molecule. As used herein, a "target molecule" is a molecule with which an NOVX protein 
binds or interacts in nature, for example, a molecule on the surface of a cell which expresses 
an NOVX interacting protein, a molecule on the surface of a second cell, a molecule in the 
extracellular milieu, a molecule associated with the internal surface of a cell membrane or a 
cytoplasmic molecule. An NOVX target molecule can be a non-NOVX molecule or an 
NOVX protein or polypeptide of the invention. In one embodiment, an NOVX target 
molecule is a component of a signal transduction pathway that facilitates transduction of an 
extracellular signal (e.g. a signal generated by binding of a compound to a membrane-bound 
NOVX molecule) through the cell membrane and into the cell. The target, for example, can be 
a second intercellular protein that has catalytic activity or a protein that facilitates the 
association of downstream signaling molecules with NOVX. 

Determining the ability of the NOVX protein to bind to or interact with an NOVX 
target molecule can be accomplished by one of the methods described above for determining 
direct binding. In one embodiment, determining the ability of the NOVX protein to bind to or 
interact with an NOVX target molecule can be accomplished by determining the activity of the 
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target molecule. For example, the activity of the target molecule can be determined by 
detecting induction of a cellular second messenger of the target (Le. intracellular Ca 2+ , 
diacylglycerol, IP3, etc.), detecting catalytic/enzymatic activity of the target an appropriate 
substrate, detecting the induction of a reporter gene (comprising an NOVX-responsive 
regulatory element operatively linked to a nucleic acid encoding a detectable marker, e.g., 
luciferase), or detecting a cellular response, for example, cell survival, cellular differentiation, 
or cell proliferation. 

In yet another embodiment, an assay of the invention is a cell-free assay comprising 
contacting an NOVX protein or biologically-active portion thereof with a test compound and 
determining the ability of the test compound to bind to the NOVX protein or biologically- 
active portion thereof. Binding of the test compound to the NOVX protein can be determined 
either directly or indirectly as described above. In one such embodiment, the assay comprises 
contacting the NOVX protein or biologically-active portion thereof with a known compound 
which binds NOVX to form an assay mixture, contacting the assay mixture with a test 
compound, and determining the ability of the test compound to interact with an NOVX 
protein, wherein determining the ability of the test compound to interact with an NOVX 
protein comprises determining the ability of the test compound to preferentially bind to NOVX 
or biologically-active portion thereof as compared to the known compound. 

In still another embodiment, an assay is a cell-free assay comprising contacting NOVX 
protein or biologically-active portion thereof with a test compound and determining the ability 
of the test compound to modulate {e.g. stimulate or inhibit) the activity of the NOVX protein 
or biologically-active portion thereof. Determining the ability of the test compound to 
modulate the activity of NOVX can be accomplished, for example, by determining the ability 
of the NOVX protein to bind to an NOVX target molecule by one of the methods described 
above for determining direct binding. In an alternative embodiment, determining the ability of 
the test compound to modulate the activity of NOVX protein can be accomplished by 
determining the ability of the NOVX protein further modulate an NOVX target molecule. For 
example, the catalytic/enzymatic activity of the target molecule on an appropriate substrate 
can be determined as described, supra. 

In yet another embodiment, the cell-free assay comprises contacting the NOVX protein 
or biologically-active portion thereof with a known compound which binds NOVX protein to 
form an assay mixture, contacting the assay mixture with a test compound, and determining 
the ability of the test compound to interact with an NOVX protein, wherein determining the 
ability of the test compound to interact with an NOVX protein comprises determining the 
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ability of the NOVX protein to preferentially bind to or modulate the activity of an NOVX 
target molecule. 

The cell-free assays of the invention are amenable to use of both the soluble form or the 
membrane-bound form of NOVX protein. In the case of cell-free assays comprising the 
membrane-bound form of NOVX protein, it may be desirable to utilize a solubilizing agent 
such that the membrane-bound form of NOVX protein is maintained in solution. Examples of 
such solubilizing agents include non-ionic detergents such as n-octylglucoside, 
n-dodecylglucoside, n-dodecylmaltoside, octanoyl-N-methylglucamide, 
decanoyl-N-methylglucamide, Triton® X-100, Triton® X-l 14, Thesit®, 
Isotridecypoly(ethylene glycol ether) n , N-dodecyl--N,N-dimethyl-3-ammonio-l -propane 
sulfonate, 3-(3-cholamidopropyl) dimethylamminiol-1 -propane sulfonate (CHAPS), or 
3-(3-cholamidopropyl)dimethylamminiol-2-hydroxy-l -propane sulfonate (CHAPSO). 

In more than one embodiment of the above assay methods of the invention, it may be 
desirable to immobilize either NOVX protein or its target molecule to facilitate separation of 
complexed from uncomplexed forms of one or both of the proteins, as well as to accommodate 
automation of the assay. Binding of a test compound to NOVX protein, or interaction of 
NOVX protein with a target molecule in the presence and absence of a candidate compound, 
can be accomplished in any vessel suitable for containing the reactants. Examples of such 
vessels include microtiter plates, test tubes, and micro-centrifuge tubes. In one embodiment, a 
fusion protein can be provided that adds a domain that allows one or both of the proteins to be 
bound to a matrix. For example, GST-NOVX fusion proteins or GST-target fusion proteins 
can be adsorbed onto glutathione sepharose beads (Sigma Chemical, St. Louis, MO) or 
glutathione derivatized microtiter plates, that are then combined with the test compound or the 
test compound and either the non-adsorbed target protein or NOVX protein, and the mixture is 
incubated under conditions conducive to complex formation (e.g., at physiological conditions 
for salt and pH). Following incubation, the beads or microtiter plate wells are washed to 
remove any unbound components, the matrix immobilized in the case of beads, complex 
determined either directly or indirectly, for example, as described, supra. Alternatively, the 
complexes can be dissociated from the matrix, and the level of NOVX protein binding or 
activity determined using standard techniques. 

Other techniques for immobilizing proteins on matrices can also be used in the 
screening assays of the invention. For example, either the NOVX protein or its target 
molecule can be immobilized utilizing conjugation of biotin and strep tavidin. Biotinylated 
NOVX protein or target molecules can be prepared from biotin-NHS 
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(N-hydroxy-succinimide) using techniques well-known within the art (e.g., biotinylation kit, 
Pierce Chemicals, Rockford, 111.), and immobilized in the wells of streptavidin-coated 96 well 
plates (Pierce Chemical). Alternatively, antibodies reactive with NOVX protein or target 
molecules, but which do not interfere with binding of the NOVX protein to its target molecule, 
can be derivatized to the wells of the plate, and unbound target or NOVX protein trapped in 
the wells by antibody conjugation. Methods for detecting such complexes, in addition to those 
described above for the GST-immobilized complexes, include immunodetection of complexes 
using antibodies reactive with the NOVX protein or target molecule, as well as enzyme-linked 
assays that rely on detecting an enzymatic activity associated with the NOVX protein or target 
molecule. 

In another embodiment, modulators of NOVX protein expression are identified in a 
method wherein a cell is contacted with a candidate compound and the expression of NOVX 
mRNA or protein in the cell is determined. The level of expression of NOVX mRNA or 
protein in the presence of the candidate compound is compared to the level of expression of 
NOVX mRNA or protein in the absence of the candidate compound. The candidate 
compound can then be identified as a modulator of NOVX mRNA or protein expression based 
upon this comparison. For example, when expression of NOVX mRNA or protein is greater 
(i.e., statistically significantly greater) in the presence of the candidate compound than in its 
absence, the candidate compound is identified as a stimulator of NOVX mRNA or protein 
expression. Alternatively, when expression of NOVX mRNA or protein is less (statistically 
significantly less) in the presence of the candidate compound than in its absence, the candidate 
compound is identified as an inhibitor of NOVX mRNA or protein expression. The level of 
NOVX mRNA or protein expression in the cells can be determined by methods described 
herein for detecting NOVX mRNA or protein. 

In yet another aspect of the invention, the NOVX proteins can be used as "bait 
proteins" in a two-hybrid assay or three hybrid assay (see, e.g., U.S. Patent No. 5,283,317; 
Zervos, et al, 1993. Cell 72: 223-232; Madura, et aL, 1993. J. Biol. Chem. 268: 12046-12054; 
Bartel, et al, 1993. Biotechniques 14: 920-924; Iwabuchi, et al., 1993. Oncogene 8: 
1693-1696; and Brent WO 94/10300), to identify other proteins that bind to or interact with 
NOVX ("NOVX-binding proteins" or "NOVX-bp") and modulate NOVX activity. Such 
NOVX-binding proteins are also likely to be involved in the propagation of signals by the 
NOVX proteins as, for example, upstream or downstream elements of the NOVX pathway. 

The two-hybrid system is based on the modular nature of most transcription factors, 
which consist of separable DNA-binding and activation domains. Briefly, the assay utilizes 
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two different DNA constructs. In one construct, the gene that codes for NOVX is fused to a 
gene encoding the DNA binding domain of a known transcription factor (e.g., GAL-4). In the 
other construct, a DNA sequence, from a library of DNA sequences, that encodes an 
unidentified protein ("prey" or "sample") is fused to a gene that codes for the activation 
domain of the known transcription factor. If the "bait" and the "prey" proteins are able to 
interact, in vivo, forming an NOVX-dependent complex, the DNA-binding and activation 
domains of the transcription factor are brought into close proximity. This proximity allows 
transcription of a reporter gene (e.g., LacZ) that is operably linked to a transcriptional 
regulatory site responsive to the transcription factor. Expression of the reporter gene can be 
detected and cell colonies containing the functional transcription factor can be isolated and 
used to obtain the cloned gene that encodes the protein which interacts with NOVX. 

The invention further pertains to novel agents identified by the aforementioned 
screening assays and uses thereof for treatments as described herein. 

Detection Assays 

Portions or fragments of the cDNA sequences identified herein (and the corresponding 
complete gene sequences) can be used in numerous ways as polynucleotide reagents. By way 
of example, and not of limitation, these sequences can be used to: (0 map their respective 
genes on a chromosome; and, thus, locate gene regions associated with genetic disease; (//) 
identify an individual from a minute biological sample (tissue typing); and (Hi) aid in forensic 
identification of a biological sample. Some of these applications are described in the 
subsections, below. 

Chromosome Mapping 

Once the sequence (or a portion of the sequence) of a gene has been isolated, this 
sequence can be used to map the location of the gene on a chromosome. This process is called 
chromosome mapping. Accordingly, portions or fragments of the NOVX sequences, SEQ ED 
NO: 2n-l, wherein n is an integer between 1 and 178, or fragments or derivatives thereof, can 
be used to map the location of the NOVX genes, respectively, on a chromosome. The 
mapping of the NOVX sequences to chromosomes is an important first step in correlating 
these sequences with genes associated with disease. 

Briefly, NOVX genes can be mapped to chromosomes by preparing PCR primers 

(preferably 15-25 bp in length) from the NOVX sequences. Computer analysis of the NOVX, 

sequences can be used to rapidly select primers that do not span more than one exon in the 
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genomic DNA, thus complicating the amplification process. These primers can then be used 
for PCR screening of somatic cell hybrids containing individual human chromosomes. Only 
those hybrids containing the human gene corresponding to the NOVX sequences will yield an 
amplified fragment. 

Somatic cell hybrids are prepared by fusing somatic cells from different mammals 
(e.g., human and mouse cells). As hybrids of human and mouse cells grow and divide, they 
gradually lose human chromosomes in random order, but retain the mouse chromosomes. By 
using media in which mouse cells cannot grow, because they lack a particular enzyme, but in 
which human cells can, the one human chromosome that contains the gene encoding the 
needed enzyme will be retained. By using various media, panels of hybrid cell lines can be 
established. Each cell line in a panel contains either a single human chromosome or a small 
number of human chromosomes, and a full set of mouse chromosomes, allowing easy 
mapping of individual genes to specific human chromosomes. See, e.g., D'Eustachio, et ai, 
1983. Science 220: 919-924. Somatic cell hybrids containing only fragments of human 
chromosomes can also be produced by using human chromosomes with translocations and 
deletions. 

PCR mapping of somatic cell hybrids is a rapid procedure for assigning a particular 
sequence to a particular chromosome. Three or more sequences can be assigned per day using 
a single thermal cycler. Using the NOVX sequences to design oligonucleotide primers, sub- 
localization can be achieved with panels of fragments from specific chromosomes. 
Fluorescence in situ hybridization (FISH) of a DNA sequence to a metaphase chromosomal 
spread can further be used to provide a precise chromosomal location in one step. 
Chromosome spreads can be made using cells whose division has been blocked in metaphase 
by a chemical like colcemid that disrupts the mitotic spindle. The chromosomes can be treated 
briefly with trypsin, and then stained with Giemsa. A pattern of light and dark bands develops 
on each chromosome, so that the chromosomes can be identified individually. The FISH 
technique can be used with a DNA sequence as short as 500 or 600 bases. However, clones 
larger than 1,000 bases have a higher likelihood of binding to a unique chromosomal location 
with sufficient signal intensity for simple detection. Preferably 1 ,000 bases, and more 
preferably 2,000 bases, will suffice to get good results at a reasonable amount of time. For a 
review of this technique, see, Verma, et al. t Human Chromosomes: A Manual of Basic 
Techniques (Pergamon Press, New York 1988). 

Reagents for chromosome mapping can be used individually to mark a single 
chromosome or a single site on that chromosome, or panels of reagents can be used for 
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marking multiple sites and/or multiple chromosomes. Reagents corresponding to noncoding 
regions of the genes actually are preferred for mapping purposes. Coding sequences are more 
likely to be conserved within gene families, thus increasing the chance of cross hybridizations 
during chromosomal mapping. 

Once a sequence has been mapped to a precise chromosomal location, the physical 
position of the sequence on the chromosome can be correlated with genetic map data. Such 
data are found, e.g., in McKusick, Mendelian Inheritance in Man, available on-line 
through Johns Hopkins University Welch Medical Library). The relationship between genes 
and disease, mapped to the same chromosomal region, can then be identified through linkage 
analysis (co-inheritance of physically adjacent genes), described in, eg., Egeland, et al> 1987. 
Nature, 325: 783-787. 

Moreover, differences in the DNA sequences between individuals affected and 
unaffected with a disease associated with the NOVX gene, can be determined. If a mutation is 
observed in some or all of the affected individuals but not in any unaffected individuals, then 
the mutation is likely to be the causative agent of the particular disease. Comparison of 
affected and unaffected individuals generally involves first looking for structural alterations in 
the chromosomes, such as deletions or translocations that are visible from chromosome 
spreads or detectable using PCR based on that DNA sequence. Ultimately, complete 
sequencing of genes from several individuals can be performed to confirm the presence of a 
mutation and to distinguish mutations from polymorphisms. 

Tissue Typing 

The NOVX sequences of the invention can also be used to identify individuals from 
minute biological samples. In this technique, an individual's genomic DNA is digested with 
one or more restriction enzymes, and probed on a Southern blot to yield unique bands for 
identification. The sequences of the invention are useful as additional DNA markers for RFLP 
("restriction fragment length polymorphisms," described in U.S. Patent No. 5,272,057). 
Furthermore, the sequences of the invention can be used to provide an alternative technique 
that determines the actual base-by-base DNA sequence of selected portions of an individual's 
genome. Thus, the NOVX sequences described herein can be used to prepare two PCR 
primers from the 5'- and 3-termini of the sequences. These primers can then be used to 
amplify an individual's DNA and subsequently sequence it. 

Panels of corresponding DNA sequences from individuals, prepared in this manner, 
can provide unique individual identifications, as each individual will have a unique set of such 
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DNA sequences due to allelic differences. The sequences of the invention can be used to 
obtain such identification sequences from individuals and from tissue. The NOVX sequences 
of the invention uniquely represent portions of the human genome. Allelic variation occurs to 
some degree in the coding regions of these sequences, and to a greater degree in the noncoding 
regions. It is estimated that allelic variation between individual humans occurs with a 
frequency of about once per each 500 bases. Much of the allelic variation is due to single 
nucleotide polymorphisms (SNPs), which include restriction fragment length polymorphisms 
(RFLPs). 

Each of the sequences described herein can, to some degree, be used as a standard against 
which DNA from an individual can be compared for identification purposes. Because greater 
numbers of polymorphisms occur in the noncoding regions, fewer sequences are necessary to 
differentiate individuals. The noncoding sequences can comfortably provide positive 
individual identification with a panel of perhaps 10 to 1,000 primers that each yield a 
noncoding amplified sequence of 100 bases. If predicted coding sequences, such as those in 
SEQ ID NO: 2n-l, wherein n is an integer between 1 and 178 are used, a more appropriate 
number of primers for positive individual identification would be 500-2,000. 

Predictive Medicine 

The invention also pertains to the field of predictive medicine in which diagnostic 
assays, prognostic assays, pharmacogenomics, and monitoring clinical trials are used for 
prognostic (predictive) purposes to thereby treat an individual prophylactically. Accordingly, 
one aspect of the invention relates to diagnostic assays for determining NOVX protein and/or 
nucleic acid expression as well as NOVX activity, in the context of a biological sample (e.g., 
blood, serum, cells, tissue) to thereby determine whether an individual is afflicted with a 
disease or disorder, or is at risk of developing a disorder, associated with aberrant NOVX 
expression or activity. The disorders include metabolic disorders, diabetes, obesity, infectious 
disease, anorexia, cancer-associated cachexia, cancer, neurodegenerative disorders, 
Alzheimer's Disease, Parkinson's Disorder, immune disorders, and hematopoietic disorders, 
and the various dyslipidemias, metabolic disturbances associated with obesity, the metabolic 
syndrome X and wasting disorders associated with chronic diseases and various cancers. The 
invention also provides for prognostic (or predictive) assays for determining whether an 
individual is at risk of developing a disorder associated with NOVX protein, nucleic acid 
expression or activity. For example, mutations in an NOVX gene can be assayed in a 
biological sample. Such assays can be used for prognostic or predictive purpose to thereby 
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prophylactically treat an individual prior to the onset of a disorder characterized by or 
associated with NOVX protein, nucleic acid expression, or biological activity. 

Another aspect of the invention provides methods for determining NOVX protein, 
nucleic acid expression or activity in an individual to thereby select appropriate therapeutic or 
prophylactic agents for that individual (referred to herein as "pharmacogenomics"). 
Pharmacogenomics allows for the selection of agents (e.g., drugs) for therapeutic or 
prophylactic treatment of an individual based on the genotype of the individual (e.g., the 
genotype of the individual examined to determine the ability of the individual to respond to a 
particular agent.) 

Yet another aspect of the invention pertains to monitoring the influence of agents (e.g., drugs, 
compounds) on the expression or activity of NOVX in clinical trials. 

These and other agents are described in further detail in the following sections. 

Diagnostic Assays 

An exemplary method for detecting the presence or absence of NOVX in a biological 
sample involves obtaining a biological sample from a test subject and contacting the biological 
sample with a compound or an agent capable of detecting NOVX protein or nucleic acid (e.g., 
mRNA, genomic DNA) that encodes NOVX protein such that the presence of NOVX is 
detected in the biological sample. An agent for detecting NOVX mRNA or genomic DNA is a 
labeled nucleic acid probe capable of hybridizing to NOVX mRNA or genomic DNA. The 
nucleic acid probe can be, for example, a full-length NOVX nucleic acid, such as the nucleic 
acid of SEQ ID NO: 2n-l, wherein n is an integer between 1 and 178, or a portion thereof, 
such as an oligonucleotide of at least 15, 30, 50, 100, 250 or 500 nucleotides in length and 
sufficient to specifically hybridize under stringent conditions to NOVX mRNA or genomic 
DNA. Other suitable probes for use in the diagnostic assays of the invention are described 
herein. 

An agent for detecting NOVX protein is an antibody capable of binding to NOVX 

protein, preferably an antibody with a detectable label. Antibodies can be polyclonal, or more 

preferably, monoclonal. An intact antibody, or a fragment thereof (e.g., Fab or F(ab') 2 ) can be 

used. The term "labeled", with regard to the probe or antibody, is intended to encompass 

direct labeling of the probe or antibody by coupling (i.e., physically linking) a detectable 

substance to the probe or antibody, as well as indirect labeling of the probe or antibody by 

reactivity with another reagent that is directly labeled. Examples of indirect labeling include 

detection of a primary antibody using a fluorescently-labeled secondary antibody and 
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end-labeling of a DNA probe with biotin such that it can be detected with fluorescently- 
labeled streptavidin. The term "biological sample" is intended to include tissues, cells and 
biological fluids isolated from a subject, as well as tissues, cells and fluids present within a 
subject. That is, the detection method of the invention can be used to detect NOVX mRNA, 
protein, or genomic DNA in a biological sample in vitro as well as in vivo. For example, in 
vitro techniques for detection of NOVX mRNA include Northern hybridizations and in situ 
hybridizations. In vitro techniques for detection of NOVX protein include enzyme linked 
immunosorbent assays (ELIS As), Western blots, immunoprecipitations, and 
immunofluorescence. In vitro techniques for detection of NOVX genomic DNA include 
Southern hybridizations. Furthermore, in vivo techniques for detection of NOVX protein 
include introducing into a subject a labeled anti-NOVX antibody. For example, the antibody 
can be labeled with a radioactive marker whose presence and location in a subject can be 
detected by standard imaging techniques. 

In one embodiment, the biological sample contains protein molecules from the test 
subject. Alternatively, the biological sample can contain mRNA molecules from the test 
subject or genomic DNA molecules from the test subject. A preferred biological sample is a 
peripheral blood leukocyte sample isolated by conventional means from a subject. 

In another embodiment, the methods further involve obtaining a control biological 
sample from a control subject, contacting the control sample with a compound or agent 
capable of detecting NOVX protein, mRNA, or genomic DNA, such that the presence of 
NOVX protein, mRNA or genomic DNA is detected in the biological sample, and comparing 
the presence of NOVX protein, mRNA or genomic DNA in the control sample with the 
presence of NOVX protein, mRNA or genomic DNA in the test sample. 

The invention also encompasses kits for detecting the presence of NOVX in a 
biological sample. For example, the kit can comprise: a labeled compound or agent capable of 
detecting NOVX protein or mRNA in a biological sample; means for determining the amount 
of NOVX in the sample; and means for comparing the amount of NOVX in the sample with a 
standard. The compound or agent can be packaged in a suitable container. The kit can further 
comprise instructions for using the kit to detect NOVX protein or nucleic acid. 

Prognostic Assays 

The diagnostic methods described herein can furthermore be utilized to identify 
subjects having or at risk of developing a disease or disorder associated with aberrant NOVX 
expression or activity. For example, the assays described herein, such as the preceding 
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diagnostic assays or the following assays, can be utilized to identify a subject having or at risk 
of developing a disorder associated with NOVX protein, nucleic acid expression or activity. 
Alternatively, the prognostic assays can be utilized to identify a subject having or at risk for 
developing a disease or disorder. Thus, the invention provides a method for identifying a 
disease or disorder associated with aberrant NOVX expression or activity in which a test 
sample is obtained from a subject and NOVX protein or nucleic acid (e.g., mRNA, genomic 
DNA) is detected, wherein the presence of NOVX protein or nucleic acid is diagnostic for a 
subject having or at risk of developing a disease or disorder associated with aberrant NOVX 
expression or activity. As used herein, a "test sample" refers to a biological sample obtained 
from a subject of interest. For example, a test sample can be a biological fluid (e.g., serum), 
cell sample, or tissue. 

Furthermore, the prognostic assays described herein can be used to determine whether 
a subject can be administered an agent (e.g., an agonist, antagonist, peptidomimetic, protein, 
peptide, nucleic acid, small molecule, or other drug candidate) to treat a disease or disorder 
associated with aberrant NOVX expression or activity. For example, such methods can be 
used to determine whether a subject can be effectively treated with an agent for a disorder. 
Thus, the invention provides methods for determining whether a subject can be effectively 
treated with an agent for a disorder associated with aberrant NOVX expression or activity in 
which a test sample is obtained and NOVX protein or nucleic acid is detected (e.g., wherein 
the presence of NOVX protein or nucleic acid is diagnostic for a subject that can be 
administered the agent to treat a disorder associated with aberrant NOVX expression or 
activity). 

The methods of the invention can also be used to detect genetic lesions in an NOVX 
gene, thereby determining if a subject with the lesioned gene is at risk for a disorder 
characterized by aberrant cell proliferation and/or differentiation. In various embodiments, the 
methods include detecting, in a sample of cells from the subject, the presence or absence of a 
genetic lesion characterized by at least one of an alteration affecting the integrity of a gene 
encoding an NOVX-protein, or the misexpression of the NOVX gene. For example, such 
genetic lesions can be detected by ascertaining the existence of at least one of: (/) a deletion of 
one or more nucleotides from an NOVX gene; (//) an addition of one or more nucleotides to an 
NOVX gene; (Hi) a substitution of one or more nucleotides of an NOVX gene, (iv) a 
chromosomal rearrangement of an NOVX gene; (v) an alteration in the level of a messenger 
RNA transcript of an NOVX gene, (vi) aberrant modification of an NOVX gene, such as of the 
methylation pattern of the genomic DNA, (vii) the presence of a non- wild-type splicing pattern 
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of a messenger RNA transcript of an NOVX gene, (vm) a non-wild-type level of an NOVX 
protein, (ix) allelic loss of an NOVX gene, and (x) inappropriate post-translational 
modification of an NOVX protein. As described herein, there are a large number of assay 
techniques known in the art which can be used for. detecting lesions in an NOVX gene. A 
preferred biological sample is a peripheral blood leukocyte sample isolated by conventional 
means from a subject. However, any biological sample containing nucleated cells may be 
used, including, for example, buccal mucosal cells. 

In certain embodiments, detection of the lesion involves the use of a probe/primer in a 
polymerase chain reaction (PCR) (see, e.g., U.S. Patent Nos. 4,683,195 and 4,683,202), such 
as anchor PCR or RACE PCR, or, alternatively, in a ligation chain reaction (LCR) (see, e.g., 
Landegran, et aL, 1988. Science 241 : 1077-1080; and Nakazawa, et al. 9 1994. Proc. Natl. 
Acad ScL USA 91 : 360-364), the latter of which can be particularly useful for detecting point 
mutations in the NOVX-gene (see, Abravaya, et aL, 1995. Nucl. Acids Res. 23: 675-682). 
This method can include the steps of collecting a sample of cells from a patient, isolating 
nucleic acid (e.g., genomic, mRNA or both) from the cells of the sample, contacting the 
nucleic acid sample with one or more primers that specifically hybridize to an NOVX gene 
under conditions such that hybridization and amplification of the NOVX gene (if present) 
occurs, and detecting the presence or absence of an amplification product, or detecting the size 
of the amplification product and comparing the length to a control sample. It is anticipated 
that PCR and/or LCR may be desirable to use as a preliminary amplification step in 
conjunction with any of the techniques used for detecting mutations described herein. 

Alternative amplification methods include: self sustained sequence replication (see, 
Guatelli, et aL, 1990. Proc. Natl. Acad. Sci. USA 87: 1874-1878), transcriptional amplification 
system (see, Kwoh, et aL; 1989. Proc. Natl. Acad. ScL USA 86: 1 173-1 177); QP Replicase 
(see, Lizardi, et al, 1988. BioTechnology 6: 1 197), or any other nucleic acid amplification 
method, followed by the detection of the amplified molecules using techniques well known to 
those of skill in the art. These detection schemes are especially useful for the detection of 
nucleic acid molecules if such molecules are present in very low numbers. 

In an alternative embodiment, mutations in an NOVX gene from a sample cell can be 
identified by alterations in restriction enzyme cleavage patterns. For example, sample and 
control DNA is isolated, amplified (optionally), digested with one or more restriction 
endonucleases, and fragment length sizes are determined by gel electrophoresis and compared. 
Differences in fragment length sizes between sample and control DNA indicates mutations in 
the sample DNA. Moreover, the use of sequence specific ribozymes (see, e.g., U.S. Patent 
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No. 5,493,531) can be used to score for the presence of specific mutations by development or 
loss of a ribozyme cleavage site. 

In other embodiments, genetic mutations in NOVX can be identified by hybridizing a 
sample and control nucleic acids, e.g., DNA or RNA, to high-density arrays containing 
hundreds or thousands of oligonucleotides probes. See, e.g., Cronin, et al, 1996. Human 
Mutation 7: 244-255; Kozal, et al, 1996. Nat Med. 2: 753-759. For example, genetic 
mutations in NOVX can be identified in two dimensional arrays containing light-generated 
DNA probes as described in Cronin, et al, supra. Briefly, a first hybridization array of probes 
can be used to scan through long stretches of DNA in a sample and control to identify base 
changes between the sequences by making linear arrays of sequential overlapping probes. 
This step allows the identification of point mutations. This is followed by a second 
hybridization array that allows the characterization of specific mutations by using smaller, 
specialized probe arrays complementary to all variants or mutations detected. Each mutation 
array is composed of parallel probe sets, one complementary to the wild-type gene and the 
other complementary to the mutant gene. 

In yet another embodiment, any of a variety of sequencing reactions known in the art 
can be used to directly sequence the NOVX gene and detect mutations by comparing the 
sequence of the sample NOVX with the corresponding wild-type (control) sequence. 
Examples of sequencing reactions include those based on techniques developed by Maxim and 
Gilbert, 1977. Proc. Natl. Acad. Sci. USA 74: 560 or Sanger, 1977. Proc. Natl Acad. Sci. USA 
74: 5463. It is also contemplated that any of a variety of automated sequencing procedures 
can be utilized when performing the diagnostic assays (see, e.g., Naeve, et al., 1995. 
Biotechniques 19: 448), including sequencing by mass spectrometry (see, e.g., PCT 
International Publication No. WO 94/16101 ; Cohen, et al, 1996. Adv. Chromatography 36: 
127-162; and Griffin, et al 9 1993. Appl Biochem. Biotechnol. 38: 147-159). 

Other methods for detecting mutations in the NOVX gene include methods in which 
protection from cleavage agents is used to detect mismatched bases in RNA/RNA or 
RNA/DNA heteroduplexes. See, e.g., Myers, et al y 1985. Science 230: 1242. In general, the 
art technique of "mismatch cleavage" starts by providing heteroduplexes of formed by 
hybridizing (labeled) RNA or DNA containing the wild-type NOVX sequence with potentially 
mutant RNA or DNA obtained from a tissue sample. The double-stranded duplexes are 
treated with an agent that cleaves single-stranded regions of the duplex such as which will 
exist due to basepair mismatches between the control and sample strands. For instance, 
RNA/DNA duplexes can be treated with RNase and DNA/DNA hybrids treated with Si 
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nuclease to enzymatically digesting the mismatched regions. In other embodiments, either 
DNA/DNA or RNA/DNA duplexes can be treated with hydroxylamine or osmium tetroxide 
and with piperidine in order to digest mismatched regions. After digestion of the mismatched 
regions, the resulting material is then separated by size on denaturing polyacrylamide gels to 
determine the site of mutation. See, e.g., Cotton, et al, 1988. Proc. Natl Acad. Set USA 85: 
4397; Saleeba, et al, 1992. Methods Enzymol 217: 286-295. In an embodiment, the control 
DNA or RNA can be labeled for detection. 

In still another embodiment, the mismatch cleavage reaction employs one or more 
proteins that recognize mismatched base pairs in double-stranded DNA (so called "DNA 
mismatch repair" enzymes) in defined systems for detecting and mapping point mutations in 
NOVX cDNAs obtained from samples of cells. For example, the mutY enzyme of E. coli 
cleaves A at G/A mismatches and the thymidine DNA glycosylase from HeLa cells cleaves T 
at G/T mismatches. See, e.g., Hsu, etal, 1994. Carcinogenesis 15: 1657-1662. According to 
an exemplary embodiment, a probe based on an NOVX sequence, e.g., a wild-type NOVX 
sequence, is hybridized to a cDNA or other DNA product from a test cell(s). The duplex is 
treated with a DNA mismatch repair enzyme, and the cleavage products, if any, can be 
detected from electrophoresis protocols or the like. See, e.g., U.S. Patent No. 5,459,039. 

In other embodiments, alterations in electrophoretic mobility will be used to identify 
mutations in NOVX genes. For example, single strand conformation polymorphism (SSCP) 
may be used to detect differences in electrophoretic mobility between mutant and wild type 
nucleic acids. See, e.g., Orita, et al, 1989. Proc. Natl. Acad. ScL USA : 86: 2766; Cotton, 
1993. Mutat. Res. 285: 125-144; Hayashi, 1992. Genet. Anal. Tech. Appl 9: 73-79. 
Single-stranded DNA fragments of sample and control NOVX nucleic acids will be denatured 
and allowed to renature. The secondary structure of single-stranded nucleic acids varies 
according to sequence, the resulting alteration in electrophoretic mobility enables the detection 
of even a single base change. The DNA fragments may be labeled or detected with labeled 
probes. The sensitivity of the assay may be enhanced by using RNA (rather than DNA), in 
which the secondary structure is more sensitive to a change in sequence. In one embodiment, 
the subject method utilizes heteroduplex analysis to separate double stranded heteroduplex 
molecules on the basis of changes in electrophoretic mobility. See, e.g., Keen, et a/., 1991. 
Trends Genet. 7: 5. 

In yet another embodiment, the movement of mutant or wild-type fragments in 
polyacrylamide gels containing a gradient of denaturant is assayed using denaturing gradient 
gel electrophoresis (DGGE). See, e.g., Myers, et al., 1985. Nature 313: 495. When DGGE is 
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used as the method of analysis, DNA will be modified to insure that it does not completely 
denature, for example by adding a GC clamp of approximately 40 bp of high-melting GC-rich 
DNA by PCR. In a further embodiment, a temperature gradient is used in place of a 
denaturing gradient to identify differences in the mobility of control and sample DNA. See, 
e.g., Rosenbaum and Reissner, 1987. Biophys. Chem. 265: 12753. 

Examples of other techniques for detecting point mutations include, but are not limited 
to, selective oligonucleotide hybridization, selective amplification, or selective primer 
extension. For example, oligonucleotide primers may be prepared in which the known 
mutation is placed centrally and then hybridized to target DNA under conditions that permit 
hybridization only if a perfect match is found. See, e.g., Saiki, et aL, 1986. Nature 324: 163; 
Saiki, et aL, 1989. Proc. Natl. Acad. ScL USA 86: 6230. Such allele specific oligonucleotides 
are hybridized to PCR amplified target DNA or a number of different mutations when the 
oligonucleotides are attached to the hybridizing membrane and hybridized with labeled target 
DNA. 

Alternatively, allele specific amplification technology that depends on selective PCR 
amplification may be used in conjunction with the instant invention. Oligonucleotides used as 
primers for specific amplification may carry the mutation of interest in the center of the 
molecule (so that amplification depends on differential hybridization; see, e.g., Gibbs, et aL, 
1989. Nucl Acids Res. 17: 2437-2448) or at the extreme 3 f -terminus of one primer where, 
under appropriate conditions, mismatch can prevent, or reduce polymerase extension (see, e.g., 
Prossner, 1993. Tibtech. 1 1 : 238). In addition it may be desirable to introduce a novel 
restriction site in the region of the mutation to create cleavage-based detection. See, e.g., 
Gasparini, et aL, 1992. Mol. Cell Probes 6: 1 . It is anticipated that in certain embodiments 
amplification may also be performed using Taq ligase for amplification. See, e.g., Barany, 
1991. Proc. Natl. Acad. Sci. USA 88: 189. In such cases, ligation will occur only if there is a 
perfect match at the 3'-terminus of the 5* sequence, making it possible to detect the presence of 
a known mutation at a specific site by looking for the presence or absence of amplification. 

The methods described herein may be performed, for example, by utilizing 
pre-packaged diagnostic kits comprising at least one probe nucleic acid or antibody reagent 
described herein, which may be conveniently used, e.g., in clinical settings to diagnose 
patients exhibiting symptoms or family history of a disease or illness involving an NOVX 
gene. 

Furthermore, any cell type or tissue, preferably peripheral blood leukocytes, in which NOVX 
is expressed may be utilized in the prognostic assays described herein. However, any 
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biological sample containing nucleated cells may be used, including, for example, buccal 
mucosal cells. 

Pharmacogenomics 

Agents, or modulators that have a stimulatory or inhibitory effect on NOVX activity 
(e.g., NOVX gene expression), as identified by a screening assay described herein can be 
administered to individuals to treat (prophylactically or therapeutically) disorders (The 
disorders include metabolic disorders, diabetes, obesity, infectious disease, anorexia, cancer- 
associated cachexia, cancer, neurodegenerative disorders, Alzheimer's Disease, Parkinson's 
Disorder, immune disorders, and hematopoietic disorders, and the various dyslipidemias, 
metabolic disturbances associated with obesity, the metabolic syndrome X and wasting 
disorders associated with chronic diseases and various cancers.) In conjunction with such 
treatment, the pharmacogenomics (i.e., the study of the relationship between an individual's 
genotype and that individual's response to a foreign compound or drug) of the individual may 
be considered. Differences in metabolism of therapeutics can lead to severe toxicity or 
therapeutic failure by altering the relation between dose and blood concentration of the 
pharmacologically active drug. Thus, the pharmacogenomics of the individual permits the 
selection of effective agents (e.g., drugs) for prophylactic or therapeutic treatments based on a 
consideration of the individual's genotype. Such pharmacogenomics can further be used to 
determine appropriate dosages and therapeutic regimens. Accordingly, the activity of NOVX 
protein, expression of NOVX nucleic acid, or mutation content of NOVX genes in an 
individual can be determined to thereby select appropriate agent(s) for therapeutic or 
prophylactic treatment of the individual. 

Pharmacogenomics deals with clinically significant hereditary variations in the 

response to drugs due to altered drug disposition and abnormal action in affected persons. See 

e.g., Eichelbaum, 1996. Clin. Exp. Pharmacol Physiol, 23: 983-985; Linder, 1997. Clin. 

Chem., 43: 254-266. In general, two types of pharmacogenetic conditions can be 

differentiated. Genetic conditions transmitted as a single factor altering the way drugs act on 

the body (altered drug action) or genetic conditions transmitted as single factors altering the 

way the body acts on drugs (altered drug metabolism). These pharmacogenetic conditions can 

occur either as rare defects or as polymorphisms. For example, glucose-6-phosphate 

dehydrogenase (G6PD) deficiency is a common inherited enzymopathy in which the main 

clinical complication is hemolysis after ingestion of oxidant drugs (anti-malarials, 

sulfonamides, analgesics, nitrofiirans) and consumption of fava beans. 
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As an illustrative embodiment, the activity of drug metabolizing enzymes is a major 
determinant of both the intensity and duration of drug action. The discovery of genetic 
polymorphisms of drug metabolizing enzymes (e.g., N-acetyltransferase 2 (NAT 2) and 
cytochrome PREGNANCY ZONE PROTEIN PRECURSOR enzymes CYP2D6 and 
CYP2C19) has provided an explanation as to why some patients do not obtain the expected 
drug effects or show exaggerated drug response and serious toxicity after taking the standard 
and safe dose of a drug. These polymorphisms are expressed in two phenotypes in the 
population, the extensive metabolizer (EM) and poor metabolizer (PM). The prevalence of 
PM is different among different populations. For example, the gene coding for CYP2D6 is 
highly polymorphic and several mutations have been identified in PM, which all lead to the 
absence of functional CYP2D6. Poor metabolizers of CYP2D6 and CYP2C19 quite 
frequently experience exaggerated drug response and side effects when they receive standard 
doses. If a metabolite is the active therapeutic moiety, PM show no therapeutic response, as 
demonstrated for the analgesic effect of codeine mediated by its CYP2D6-formed metabolite 
morphine. At the other extreme are the so called ultra-rapid metabolizers who do not respond 
to standard doses. Recently, the molecular basis of ultra-rapid metabolism has been identified 
to be due to CYP2D6 gene amplification. 

Thus, the activity of NOVX protein, expression of NOVX nucleic acid, or mutation content of 
NOVX genes in an individual can be determined to thereby select appropriate agent(s) for 
therapeutic or prophylactic treatment of the individual. In addition, pharmacogenetic studies 
can be used to apply genotyping of polymorphic alleles encoding drug-metabolizing enzymes 
to the identification of an individual's drug responsiveness phenotype. This knowledge, when 
applied to dosing or drug selection, can avoid adverse reactions or therapeutic failure and thus 
enhance therapeutic or prophylactic efficiency when treating a subject with an NOVX 
modulator, such as a modulator identified by one of the exemplary screening assays described 
herein. 

Monitoring of Effects During Clinical Trials 

Monitoring the influence of agents (e.g., drugs, compounds) on the expression or 
activity of NOVX (e.g., the ability to modulate aberrant cell proliferation and/or 
differentiation) can be applied not only in basic drug screening, but also in clinical trials. For 
example, the effectiveness of an agent determined by a screening assay as described herein to 
increase NOVX gene expression, protein levels, or upregulate NOVX activity, can be 
monitored in clinical trails of subjects exhibiting decreased NOVX gene expression, protein 
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levels, or downregulated NOVX activity. , Alternatively, the effectiveness of an agent 
determined by ascreening assay to decrease NOVX gene expression, protein levels, or 
downregulate NOVX activity, can be monitored in clinical trails of subjects exhibiting 
increased NOVX gene expression, protein levels, or upregulated NOVX activity. In such 
clinical trials, the expression or activity of NOVX and, preferably, other genes that have been 
implicated in, for example, a cellular proliferation or immune disorder can be used as a "read 
out" or markers of the immune responsiveness of a particular cell. 

By way of example, and not of limitation, genes, including NOVX, that are modulated 
in cells by treatment with an agent (e.g., compound, drug or small molecule) that modulates 
NOVX activity (e.g., identified in a screening assay as described herein) can be identified. 
Thus, to study the effect of agents on cellular proliferation disorders, for example, in a clinical 
trial, cells can be isolated and RNA prepared and analyzed for the levels of expression of 
NOVX and other genes implicated in the disorder. The levels of gene expression (i.e., a gene 
expression pattern) can be quantified by Northern blot analysis or RT-PCR, as described 
herein, or alternatively by measuring the amount of protein produced, by one of the methods 
as described herein, or by measuring the levels of activity of NOVX or other genes. In this 
manner, the gene expression pattern can serve as a marker, indicative of the physiological 
response of the cells to the agent. Accordingly, this response state may be determined before, 
and at various points during, treatment of the individual with the agent. 

In one embodiment, the invention provides a method for monitoring the effectiveness 
of treatment of a subject with an agent (e.g., an agonist, antagonist, protein, peptide, 
peptidomimetic, nucleic acid, small molecule, or other drug candidate identified by the 
screening assays described herein) comprising the steps of (/) obtaining a pre-administration 
sample from a subject prior to administration of the agent; (ii) detecting the level of expression 
of an NOVX protein, mRNA, or genomic DNA in the preadministration sample; (Hi) obtaining 
one or more post-administration samples from the subject; (iv) detecting the level of 
expression or activity of the NOVX protein, mRNA, or genomic DNA in the 
post-administration samples; (v) comparing the level of expression or activity of the NOVX 
protein, mRNA, or genomic DNA in the pre-administration sample with the NOVX protein, 
mRNA, or genomic DNA in the post administration sample or samples; and (vi) altering the 
administration of the agent to the subject accordingly. For example, increased administration 
of the agent may be desirable to increase the expression or activity of NOVX to higher levels 
than detected, i.e., to increase the effectiveness of the agent. Alternatively, decreased 
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administration of the agent may be desirable to decrease expression or activity of NOVX to 
lower levels than detected, i.e., to decrease the effectiveness of the agent. 



Methods of Treatment 

The invention provides for both prophylactic and therapeutic methods of treating a 
subject at risk of (or susceptible to) a disorder or having a disorder associated with aberrant 
NOVX expression or activity. The disorders include cardiomyopathy, atherosclerosis, 
hypertension, congenital heart defects, aortic stenosis, atrial septal defect (ASD), 
atrioventricular (A-V) canal defect, ductus arteriosus, pulmonary stenosis, subaortic stenosis, 
ventricular septal defect (VSD), valve diseases, tuberous sclerosis, scleroderma, obesity, 
transplantation, adrenoleukodystrophy, congenital adrenal hyperplasia, prostate cancer, 
neoplasm; adenocarcinoma, lymphoma, uterus cancer, fertility, hemophilia, hypercoagulation, 
idiopathic thrombocytopenic purpura, immunodeficiencies, graft versus host disease, AIDS, 
bronchial asthma, Crohn's disease; multiple sclerosis, treatment of Albright Hereditary 
Osteodystrophy, and other diseases, disorders and conditions of the like. 

These methods of treatment will be discussed more fully, below. 

Disease and Disorders 

Diseases and disorders that are characterized by increased (relative to a subject not 
suffering from the disease or disorder) levels or biological activity may be treated with 
Therapeutics that antagonize (Le. 9 reduce or inhibit) activity. Therapeutics that antagonize 
activity may be administered in a therapeutic or prophylactic manner. Therapeutics that may 
be utilized include, but are not limited to: (/) an aforementioned peptide, or analogs, 
derivatives, fragments or homologs thereof; («) antibodies to an aforementioned peptide; (Hi) 
nucleic acids encoding an aforementioned peptide; (iv) administration of antisense nucleic acid 
and nucleic acids that are "dysfunctional" (i.e., due to a heterologous insertion within the 
coding sequences of coding sequences to an aforementioned peptide) that are utilized to 
"knockout" endogenous function of an aforementioned peptide by homologous recombination 
(see, e.g., Capecchi, 1989. Science 244: 1288-1292); or (v) modulators ( Le., inhibitors, 
agonists and antagonists, including additional peptide mimetic of the invention or antibodies 
specific to a peptide of the invention) that alter the interaction between an aforementioned 
peptide and its binding partner. 
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Diseases and disorders that are characterized by decreased (relative to a subject not suffering 
from the disease or disorder) levels or biological activity may be treated with Therapeutics that 
increase (Le., are agonists to) activity. Therapeutics that upregulate activity may be 
administered in a therapeutic or prophylactic manner. Therapeutics that may be utilized 
include, but are not limited to, an aforementioned peptide, or analogs, derivatives, fragments 
or homologs thereof; or an agonist that increases bioavailability. 

Increased or decreased levels can be readily detected by quantifying peptide and/or 
RNA, by obtaining a patient tissue sample (e.g., from biopsy tissue) and assaying it in vitro for 
RNA or peptide levels, structure and/or activity of the expressed peptides (or mRNAs of an 
aforementioned peptide). Methods that are well-known within the art include, but are not 
limited to, immunoassays (e.g., by Western blot analysis, immunoprecipitation followed by 
sodium dodecyl sulfate (SDS) polyacrylamide gel electrophoresis, immunocytochemistry, etc.) 
and/or hybridization assays to detect expression of mRNAs (e.g., Northern assays, dot blots, in 
situ hybridization, and the like). 

Prophylactic Methods 

In one aspect, the invention provides a method for preventing, in a subject, a disease or 
condition associated with an aberrant NOVX expression or activity, by administering to the 
subject an agent that modulates NOVX expression or at least one NOVX activity. Subjects at 
risk for a disease that is caused or contributed to by aberrant NOVX expression or activity can 
be identified by, for example, any or a combination of diagnostic or prognostic assays as 
described herein. Administration of a prophylactic agent can occur prior to the manifestation 
of symptoms characteristic of the NOVX aberrancy, such that a disease or disorder is 
prevented or, alternatively, delayed in its progression. Depending upon the type of NOVX 
aberrancy, for example, an NOVX agonist or NOVX antagonist agent can be used for treating 
the subject. The appropriate agent can be determined based on screening assays described 
herein. The prophylactic methods of the invention are further discussed in the following 
subsections. 
Therapeutic Methods 

Another aspect of the invention pertains to methods of modulating NOVX expression 
or activity for therapeutic purposes. The modulatory method of the invention involves 
contacting a cell with an agent that modulates one or more of the activities of NOVX protein 
activity associated with the cell. An agent that modulates NOVX protein activity can be an 
agent as described herein, such as a nucleic acid or a protein, a naturally-occurring cognate 
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ligand of an NOVX protein, a peptide, an NOVX peptidomimetic, or other small molecule. In 
one embodiment, the agent stimulates one or more NOVX protein activity. Examples of such 
stimulatory agents include active NOVX protein and a nucleic acid molecule encoding NOVX 
. that has been introduced into the cell In another embodiment, the agent inhibits one or more 
NOVX protein activity. Examples of such inhibitory agents include antisense NOVX nucleic 
acid molecules and anti-NOVX antibodies. These modulatory methods can be performed in 
vitro (e.g., by culturing the cell with the agent) or, alternatively, in vivo (e.g., by administering 
the agent to a subject). As such, the invention provides methods of treating an individual 
afflicted with a disease or disorder characterized by aberrant expression or activity of an 
NOVX protein or nucleic acid molecule. In one embodiment, the method involves 
administering an agent (e.g., an agent identified by a screening assay described herein), or 
combination of agents that modulates (e.g., up-regulates or down-regulates) NOVX expression 
or activity. In another embodiment, the method involves administering an NOVX protein or 
nucleic acid molecule as therapy to compensate for reduced or aberrant NOVX expression or 
activity. 

Stimulation of NOVX activity is desirable in situations in which NOVX is abnormally 
downregulated and/or in which increased NOVX activity is likely to have a beneficial effect. 
One example of such a situation is where a subject has a disorder characterized by aberrant 
cell proliferation and/or differentiation (e.g., cancer or immune associated disorders). Another 
example of such a situation is where the subject has a gestational disease (e.g., preclampsia). 

Determination of the Biological Effect of the Therapeutic 

In various embodiments of the invention, suitable in vitro or in vivo assays are 
performed to determine the effect of a specific Therapeutic and whether its administration is 
indicated for treatment of the affected tissue. 

In various specific embodiments, in vitro assays may be performed with representative cells of 
the type(s) involved in the patient's disorder, to determine if a given Therapeutic exerts the 
desired effect upon the cell type(s). Compounds for use in therapy may be tested in suitable 
animal model systems including, but not limited to rats, mice, chicken, cows, monkeys, 
rabbits, and the like, prior to testing in human subjects. Similarly, for in vivo testing, any of 
the animal model system known in the art may be used prior to administration to human 
subjects. 
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Prophylactic and Therapeutic Uses of the Compositions of the Invention 

The NOVX nucleic acids and proteins of the invention are useful in potential 
prophylactic and therapeutic applications implicated in a variety of disorders including, but not 
limited to: metabolic disorders, diabetes, obesity, infectious disease, anorexia, cancer- 
associated cancer, neurodegenerative disorders, Alzheimer's Disease, Parkinson's Disorder, 
immune disorders, hematopoietic disorders, and the various dyslipidemias, metabolic 
disturbances associated with obesity, the metabolic syndrome X and wasting disorders 
associated with chronic diseases and various cancers. 

As an example, a cDNA encoding the NOVX protein of the invention may be useful in 
gene therapy, and the protein may be useful when administered to a subject in need thereof. 
By way of non-limiting example, the compositions of the invention will have efficacy for 
treatment of patients suffering from: metabolic disorders, diabetes, obesity, infectious disease, 
anorexia, cancer-associated cachexia, cancer, neurodegenerative disorders, Alzheimer's 
Disease, Parkinson's Disorder, immune disorders, hematopoietic disorders, and the various 
dyslipidemias. 

Both the novel nucleic acid encoding the NOVX protein, and the NOVX protein of the 
invention, or fragments thereof, may also be useful in diagnostic applications, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. A further use could 
be as an anti-bacterial molecule (i.e., some peptides have been found to possess anti-bacterial 
properties). These materials are further useful in the generation of antibodies, which 
immunospecifically-bind to the novel substances of the invention for use in therapeutic or 
diagnostic methods. 

Sequence Analyses 

The sequence of NOVX was derived by laboratory cloning of cDNA fragments, by in 
silico prediction of the sequence. cDNA fragments covering either the full length of the DNA 
sequence, or part of the sequence, or both, were cloned. In silico prediction was based on 
sequences available in CuraGen's proprietary sequence databases or in the public human 
sequence databases, and provided either the full length DNA sequence, or some portion 
thereof. 
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The laboratory cloning was performed using one or more of the methods summarized 

below: 



SeqCalling Technology: cDNA was derived from various human samples 
representing multiple tissue types, normal and diseased states, physiological states, and 
developmental states from different donors. Samples were obtained as whole tissue, primary 
cells or tissue cultured primary cells or cell lines. Cells and cell lines may have been treated 
with biological or chemical agents that regulate gene expression, for example, growth factors, 
chemokines or steroids. The cDNA thus derived was then sequenced using. CuraGen 
Corporation's SeqCalling technology which is disclosed in full in U. S. Ser. Nos. 09/417,386 
filed Oct. 13, 1999, and 09/614,505 filed July 1 1, 2000. Sequence traces were evaluated 
manually and edited for corrections if appropriate. cDNA sequences from all samples were 
assembled together, sometimes including public human sequences, using bioinformatics 
programs to produce a consensus sequence for each assembly. Each assembly is included in 
CuraGen Corporation's database. Sequences were included as components for assembly when 
the extent of identity with another component was at least 95% over 50 bp. Each assembly 
represents a gene or portion thereof and includes information on variants, such as splice forms 
single nucleotide polymorphisms (SNPs), insertions, deletions and other sequence variations. 

Variant sequences are also included in this application. A variant sequence can include 
a single nucleotide polymorphism (SNP). A SNP can, in some instances, be referred to as a 
"cSNP" to denote that the nucleotide sequence containing the SNP originates as a cDNA. A 
SNP can arise in several ways. For example, a SNP may be due to a substitution of one 
nucleotide for another at the polymorphic site. Such a substitution can be either a transition or 
a transversion. A SNP can also arise from a deletion of a nucleotide or an insertion of a 
nucleotide, relative to a reference allele. In this case, the polymorphic site is a site at which 
one allele bears a gap with respect to a particular nucleotide in another allele. SNPs occurring 
within genes may result in an alteration of the amino acid encoded by the gene at the position 
of the SNP. Intragenic SNPs may also be silent, when a codon including a SNP encodes the 
same amino acid as a result of the redundancy of the genetic code. SNPs occurring outside the 
region of a gene, or in an intron within a gene, do not result in changes in any amino acid 
sequence of a protein but may result in altered regulation of the expression pattern. Examples 
include alteration in temporal expression, physiological response regulation, cell type 
expression regulation, intensity of expression, and stability of transcribed message. 
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Presented information includes that associated with genomic clones, public genes and 
ESTs sharing sequence identity with the disclosed sequence and CuraGen Corporation's 
Electronic Northern bioinformatic tool. 



Examples 

Example A: Sequence related information 



The NOV1 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 1 A. 



Table 1A. NOV1 Sequence Analysis 




SEQ ID NO: 1 


711 bp 


NOVla, 

CG58522-01 DNA Sequence 


TGCAG AAT GAACCAAGG AG A CTCAAAC CC AG CAGCT ACTCCG CATG CGG CAG AAGAC A 
TTCAAGGAGATGACAGATGGATGTGTCAGCACAACAGATTTGTTTTGGACTGTAAAGA 
CAAACAGCCTGATGTACCATTTGCGGGAGGCTCCGTGGTGCAGTTACTGCAGCCATAT 
GAGATATGGCGAGAGCTTTTTTCCCCACTTCATGCACTGAATTTTGGAACTGGGGGAG 
ATACAACAAGACATGTTTTGTGGAGACTAAAGAGCGGAGAACTGGGGAATACTAAGCC 
T AAGGT CATTGTTTTCTGG CT AGG AAG AAACAAC CATG AAAAT ATGGCAG AAGAGGT A 
GCAGGTGGTATGGCGGCCATCGTACAACTTATCAACACAAGGCAGCCACAGGCCAAAA 
TCATTGTATTTGATCTGTTACCTCAAGGTGAGAAACCCAACCCTTTGAGGCAAAAGAA 
CGCCAAGGTGAACCCACTCGTCAAGATTTCGCTGCTGAAACTTACCAACGTGCAGCTC 
CTGGATACTGACAGGGGTTTCGTGCACTCCGACCGTGCCATCTCCTGCCACGACATGT 
TTGATTTTCTG C ATTTG ACAGG AGG TGG CT ACT C AAAGGTCTG CAAAC CCTTGAATG A 
ACTGATCATGCAGTTGTTGGAG^AAACACCTGAGGAGAAACAAACCACCATTGCCTGA 
CTGGCTCCCATGAGT 




ORF Start: ATG at 7 


ORF Stop: TGA at 694 




SEQ ED NO: 2 


229 aa MW at 25656.2kD 


NOVla, 

CG58522-01 Protein Sequence 


MNQGDSNPAATPHAAEDIQGDDRWMCQHNRFVLDCKDKQPDVPFAGGSVVQLLQPYEI 
WRELFS PLHALNFGTGGDTTRHVLWRLKSGELGNTKPKVI VFWLGRNNHENMAE EVAG 
GMAAIVQLINTRQPQAKIIVFDLLPQGEKPNPLRQKNAKVNPLVKISLLKLTNVQLLD 
TDRGFVHSDRAISCHDMFDFLHLTGGGYSKVCKPLNELIMQLLEETPEEKQTTIA 



Further analysis of the NOVla protein yielded the following properties shown in Table 

IB. 



Table IB. Protein Sequence Properties NOVla 


Psort 
analysis: 


0.6500 probability located in cytoplasm; 0.2340 probability located 
in lysosome (lumen); 0.1000 probability located in mitochondrial 
matrix space; 0.0000 probability located in endoplasmic reticulum 
(membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOVla protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 1C. 
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Table 1C. Geneseq Results for NOVla 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOVla 
Residues/ 

Match 
Residues 


Identities/ 
Similarities 
for the 
Matched 
Region 


Expect 
Value 


AAB49433 


Human beta platelet activating 
factor acetylhydrolase - Homo 
sapiens, 229 aa. [US6146868-A, 
14-NOV-2000] 


1..229 
1..229 


196/229 
(85%) 

209/229 
(90%) 


e-114 


AAB49432 


Rat beta platelet activating factor 
acetylhydrolase - Rattus 
norvegicus, 229 aa. [US6146868-A, 
14-NOV-2000] 


1..229 
1..229 


195/229 
(85%) 

208/229 
(90%) 


e-114 


AAB49434 


Murine beta platelet activating 
factor acetylhydrolase - Mus 
musculus, 229 aa. [US6146868-A, 
14-NOV-2000] 


1..229 
1..229 


192/229 
(83%) 

205/229 
(88%) 


e-111 


AAB49436 


Bovine gamma platelet activating 
factor acetylhydrolase - Bos taurus, 
232 aa. [US6146868-A, 14-NOV- ! 
2000] 


4..219 
3..218 


124/216 
(57%) 
165/216 
(75%) 


5e-74 


AAB49435 


Human gamma platelet activating 
factor acetylhydrolase - Homo 
sapiens, 231 aa. [US6146868-A, 
14-NOV-2000] 


4..219 
3..218 


124/216 
(57%) 

164/216 
(75%) 


2e-73 



In a BLAST search of public sequence databases, the NOVla protein was found to 
have homology to the proteins shown in the BLASTP data in Table ID. 
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Table ID. Public BLASTP Results for NOVla 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOVla 
Residues/ 
Match 

RpciHiipc 


Identities/ 
Similarities for 
the Matched 

Pnrtinn 


Expect 
Value 


Q29459 


Platelet-activating factor acetylhydrolase 
IB beta subunit (EC 3.1.1 .47) (PAF 
acetylhydrolase 30 kDa subunit) (PAF- 
AH 30 kDa subunit) (PAF-AH beta 
subunit) (PAF AH beta subunit) - Homo 
sapiens (Human), and, 229 aa. 


1..229 
1..229 


196/229 (85%) 
209/229 (90%) 


e-114 


035264 


Platelet-activating factor acetylhydrolase 
IB beta subunit (EC 3.1.1.47) (PAF 
acetylhydrolase 30 kDa subunit) (PAF- 
AH 30 kDa subunit) (PAF-AH beta 
subunit) (PAF AH beta subunit) (Platelet- j 
activating factor acetylhydrolase alpha 2 
subunit) (PAF-AH alpha 2) - Rattus 
norveeicus fRafl 229 aa 


1..229 
1..229 


195/229 (85%) 
208/229 (90%) 


e-113 


Q61206 


Platelet-activating factor acetylhydrolase 
IB beta subunit (EC 3.1.1.47) (PAF 
acetylhydrolase 30 kDa subunit) (PAF- 
AH 30 kDa subunit) (PAF-AH beta 
subunit) (PAFAH beta subunit) - Mus 
musculus nVfouse^ 229 aa 

111Ui9vU1VIij I1V1UUJV f j UU< 


1..229 
1..229 


192/229(83%) 
205/229 (88%) 


e-111 


Q29460 


Platelet-activating factor acetylhydrolase 
IB gamma subunit (EC 3. 1 . 1 .47) (PAF 
acetylhydrolase 29 kDa subunit) (PAF- 
AH 29 kDa subunit) (PAF-AH gamma 
subunit) (PAFAH gamma subunit) - Bos 
taurus (Bovine), 232 aa. 


4..219 
3..218 


125/216(57%) 
165/216(75%) 


8e-74 


Q15102 1 


Platelet-activating factor acetylhydrolase 
IB gamma subunit (EC 3.1.1.47) (PAF 
acetylhydrolase 29 kDa subunit) (PAF- 
AH 29 kDa subunit) (PAF-AH gamma 
subunit) (PAFAH gamma subunit) - 
Homo sapiens (Human), 23 1 aa. 


4..219 
3..218 


124/216(57%) 
164/216(75%) 


7e-73 



PFam analysis predicts that the NOVla protein contains the domains shown in the 
Table IE. 



Table IE. Domain Analysis of NOVla 
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Pfam Domain 


NO VI a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect 
Value 


PAF-AH: domain 1 
ofl 


7..221 


150/215 (70%) 
186/215(87%) 


6e-147 



Example 2. 

The NOV2 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 2A. 



Table 2A. NOV2 Sequence Analysis 




SEQ ID NO: 3 


1457 bp 


NOV2a, 

CG58520-01 DNA Sequence 


CGATTCCGATGGGTCCTTTGAAAGCTTTTCTCTTCTCCCCTTTTCTTCTGCGGAGTCA 
AAGT AGAGGGGTGAGGTTGGTC TTCTTGTTACTG AC CCTGCATTTGGG AAACGTTGAT 
AAGGCAGATGATGAAGATGATGAGGATTTAACGGTGAACAAAACCTGGGTCTTGGCCC 
CAAAAATTCATGAAGGAGATATCACACAAATTCTGAATTCATTGCTTCAAGGCTATGA 
CAATAAACTTCGTCCAGATATAGGAGTGAGGCCCACAGTAATTGAAACTGATGTTTAT 
GTAAACAGCATTGGACCAGTTGATCCAATTAATATGGAATATACAATAGATATAATTT 
TTGC CCAAACCTGGTTTG ACAGT CGTTTAAAATTCAAT AGTACCATGAAAGTG CTTAT 
GCTTAACAGTAATATGGTTGGAAAAATTTGGATTCCTGACACTTTCTTCAGAAACTCA 
AGAAAATCTGATGCTCACTGGATAACAACTCCTAATCGTCTGCTTCGAATTTGGAATG 
ATGGACGAGTTCTGTATACTCTAAGGAGATTGACAATTAATGCAGAATGTTATCTTCA 
GCTTCATAACTTTCCCATGGATGAACATTCCTGTCCACTGGAATTTTCAAGCTTCTCT 
ATAGATGG ATACC CTAAAAATGAAATTGAGTTATCAATGGAAGCG AAGTT CTGTGG AA 
GTGGG CGACACAAGAT CCGGAGATTAT ATCAGTTTGCATTTGT AGGGTT ACGG AACTC 
AACTGAAATCACTCACACGATCTCTGGGGATTATGTTATCATGACAATTTTTTTTGAC 
CTGAGCAGAAGAATGGGATATTTCACTATTCAGACCTACATTCCATGCATTCTGACAG 
TTGTTCTTTCTTGGGTGTCTTTTTGGATCAATAAAGATGCAGTGCCTGCAAGAACATC 
GTTGGGTATGACATCTATAGGTATCACTACAGTTCTGACTATGACAACCCTGAGTACA 
ATTGCCAGGAAGTCTTTACCTAAGGTTTCTTATGTGACTGCGATGGATCTCTTTGTTT 
CTGTTTGTTTCATTTTTGTTTTTGCAGCCTTGATGGAATATGGAACCTTGCATTATTT 
T ACCAG CAACCAAAAAGG AAAG ACTGCTACTAAAGACAG AAAG CTAAAAAATAAAGCC 
TCGACTCCTGGTCTCCATCCTGGATCCACTCTGATTCCAATGAATAATATTTCTGTGC 
CGCAAGAAGATGATTATGGGTATCAGTGTTTGGAGGGCAAAGATTGTGCCAGCTTCTT 
CTGTTGCTTTG AAG ACTG CAG AACAGG AT CTTGGAGGGAAGG AAGGATACACATACG C 
ATTGCCAAAATTGACTCTTATTCTAGAATATTTTTCCCAACCGCTTTTGCCCTGTTCA 
ACTTGGTTTATTGGGTTGGCTATCTTTACTTATAAAATCTACTTCATAAGCAAAAATC 
AAAAGAA 




ORF Start: ATG at 9 


ORF Stop: TAA at 1425 




SEQ ED NO: 4 


472 aa |MW at 54100.9kD 


NOV2a, 

CG58520-01 Protein Sequence 


MGPLKAFLFSPFLLRSQSRGVRLVFLLLTLHIX3NVDKADDEDDEDLTWKTWVLAPKI 
HEGDITQILNSLLQGYDNKLRPDIGVRPTVIETDVYVNSIGPVDPINMEYTIDIIFAQ 
TWFDSRLKFNSTMKVLMLNSNMVGKI W I PDTFFRNSRKSDAHWI TTPNRLIjRI WNDGR 
VLYTLRRLTINAECYLQLHNFPMDEHSCPLEFSSFSIDGYPKNEIELSMEAKFCGSGR 
HKI RRLYQFAFVGLRNSTE ITHTI SGDYVIMTI FFDLSRRMGYFTIQTYI PCI LTWL 
SWVSFWINKDAVPARTSLGMTSIGITTVLTMTTLSTIARKSLPKVSYVTAMDLFVSVC 
FIFVFAALMEYGTLHYFTSNQKGKTATKDRKLKNKASTPGLHPGSTLIPMNNISVPQE 
DDYGYQCLEGKDCASFFCCFEDCRTGSWREGRIHIRIAKIDSYSRIFFPTAFALFNLV 
YWVGYLYL 




SEQ ID NO: 5 


1521 bp 


NOV2b, 

CG58520-02 DNA Sequence 


CAACCAAGAGGCAAGAGGCGAGAGAAGGAAAAAAAAAAAAGCGATGAGTTCGCCAAAT 


ATATGGAGCACAGGAAGCTCAGTCTACTCGACTCCrGTATTTTCACAGAAAATGACGG 
TGTGGATTCTGCTCCTGCTGT CGCTCTAC CCTGGCTTCACTAGCCAGAAATCTG ATGA 
TGACTATGAAGATTATGCTTCTAACAAAACATGGGTCTTGACTCCAAAAGTTCCTGAG 
GGTGATGTCACTGTCATCTTAAACAACCTGCTGGAAGGATATGACAATAAACTTCGGC 
CTGATATAGGAGTGAAGCCAACGTTAATTCACACAGACATGTATGTGAATAGCATTGG 
TCCAGTGAACGCTATCAATATGGAATACACTATTGATATATTTTTTGCGCAAACGTGG 
TATGACAGACGTTTGAAATTTAACAGCACCATTAAAGTCCTCCGATTGAACAGCAACA 
TGGTGGGGAAAATCTGGATTCCAGACACTTTCTTCAGAAATTCCAAAAAAGCTGATGC 
AC ACTGG ATCAC CACCCCCAACAGG ATGCTG AGAATTTGGAATGATGGTCG AGTG CTC 
TACACCCTAAGGTTGACAATTGATGCTGAGTGCCAATTACAATTGCACAACTTTCCAA 
TGGATGAACACTCCTGCCCCTTGGAGTTCTCCAGTTATGGCTATCCACGTGAAGAAAT 
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TGTTTATCAATGGAAGCGAAGTTCTGTTGAAGTGGGCGACACAAGATCCTGGAGGCTT 
TATCAATT CTCATTTGTTGGTCTAAG AAAT ACCACCGAAGTAGTGAAG ACAACTTC CG 
GAGATTATGTGGTCATGTCTGTCTACTTTGATCTGAGCAGAAGAATGGGATACTTTAC 
CATCCAGACCTATATCCCCTGCACACTCATTGTCGTCCTATCCTGGGTGTCTTTCTGG 
ATCAATAAGGATGCTGTTCCAGCCAGAACATCTTTAGGTATCACCACTGTCCTGACAA 
TGACCACCCTCAGCACCATTGCCCGGAAATCGCTCCCCAAGGTCTCCTATGTCACAGC 
GATGGATCTCTTTGTATCTGTTTGTTTCATCTTTGTCTTCTCTGCTCTGGTGGAGTAT 
GGCACCTTGCATTATTTTGTCAGCAACCGGAAACCAAGCAAGGACAAAGATAAAAAGA 
AGAAAAACCCTCTTCTTCGGATGTTTTCCTTCAAGGCCCCTACCATTGATATCCGCCC 
AAGATCAGCAACCATTCAAATGAATAATGCTACACACCTTCAAGAGAGAGATGAAGAG 
TACGGCTATGAGTGTCTGGACGGCAAGGACTGTGCCAGTTTTTTCTGCTGTTTTGAAG 
ATTGTCGAACAGGAGCTTGGAGACATGGGAGGATACATATCCGCATTGCCAAAATGGA 
CTCCTATGCTCGGATCTTCTTCCCCACTGCCTTCTGCCTGTTTAATCTGGTCTATTGG 
GTCTCCTACCTCTACCTGTGAGGAGGTATGGGTTTTACTGATATGGTTCTTATTCACT 


GAGTCTCATGGAG 




ORF Start: ATG at 44 


ORF Stop: TGA at 1469 




SEQIDNO:6 


475 aa 


MWat55184.9kD 


NOV2b, 

CG58520-02 Protein Sequence 


MSSPNIWSTGSSVYSTPVFSQKMTVWILLLIiSLYPGFTSQKSDDDYEDYASNKTWVLT 
PKVPEGDVTVILNNLLEGYDNKLRPDIGVKPTLIHTDMYVNSIGPVNAINMEYTIDIF 
FAQTWYD RRLKFNS T I KVLRLNSNMVG KI W I PDT PFRNS KKADAHWI TT PNRMLR I WN 
DGRVLYTLRLTIDAECQLQLHNFPMDEHSCPLEFSSYGYPREEIVYQWKRSSVEVGDT 
RSWRLYQFSFVGLRNTTEVVKTTSGDYVVMSVYFDLSRRMGYFTIQTYIPCTLIVVLS 
WVS FWI NKD AVP ARTS LG I TT VLTMTTLS T I ARKS LP KVS YVTAMDLFVS VC F I FVFS 
ALVE YGTLHYFVSNRKPSKDKDKKKKNPLLRMFSFKAPT IDI RPRS ATIQMNNATHLQ 
ERDEEYGYECLDGKDCASFFCCFEDCRTGAWRHGRIHIRIAKMDSYARIFFPTAFCLF 
NLVYWVSYLYL 




SEQ ID NO: 7 


1455 bp 


NOV2c, 

CG58520-03 DNA Sequence 


TAGTGCAGCACACGTAAAAAAGCGATTCCGATGGGTCCTTTGAAAGCTTTTCTCTTCT 
CCCCTTTTCTTCTGCGGAGTCAAAGTAGAGGGGTGAGGTTGGTCTTCTTGTTACTGAC 
CCTGCATTTGGGAAACTGGGTTGATAAGGCAGATGATGAAGATGATGAGGATTTAACG 
GTGAACAAAACCTGGGTCTTGGCCCCAAAAATTCATGAAGGAGATATCACACAAATTC 
TGAATTCATTGCTTCAAGGCTATGACAATAAACTTCGTCCAGATATAGGAGTGAGGCC 
CACAGTAATTGAAACTGATGTTTATGTAAACAGCATTGGACCAGTTGATCCAATTAAT 
ATGGAATATACAATAGATATAATTTTTGCCCAAACCTGGTTTGACAGTCGTTTAAAAT 
TCAAT AGTACCATGAAAGTG CTT ATG CTT AACAGTAAT ATGGTTGGAAAAATTTGGAT 
TCCTGACACTTTCTTCAGAAACTCAAGAAAATCTGATGCTCACTGGATAACAACTCCT 
AATCGTCTGCTTCGAATTTGGAATGATGGACGAGTTCTGTATACTCTAAGGTTGACAA 
TTAATGCAGAATGTTATCTTCAGCTTCATAACTTTCCCATGGATGAACATTCCTGTCC 
ACTGGAATTTTCAAGCGATGGATACCCTAAAAATGAAATTGAGTATAAGTGGAAAAAG 
CCCTCCGTAGAAGTGGCTGATCCTAAATACTGGAGATTATATCAGTTTGCATTTGTAG 
GGTTACGGAACTCAACTGAAATCACTCACACGATCTCTGGTGATTATGTTATCATGAC 
AATTTTTTTTGACCTGAGC^GAAGAATGGGATATTTCACTATTCAGACCTACATTCCA 
TG CATTCTGACAGTTGTTCTTTCTTGGGTGTCTTTTTGG ATCAAT AAAGATGCAGTG C 
CTGCAAGAACATCGTTGGGTATCACTACAGTTCTGACTATGACAACCCTGAGTACAAT 
TGCCAGGAAGTCTTTACCTAAGGTTTCTTATGTGACTGCGATGGATCTCTTTGTTTCT 
GTT TGTTTC ATTTTTG T TTTTGCAG CCTTG ATGG AAT ATGG AAC CTTG C ATTATTTT A 
CCAG C AACCAAAAAGG AAAG ACTGCTACTAAAGACAGAAAGCT AAAAAAT AAAGCCTC 
GGT AACT CCTGGT CTCCATCCTGGATCCACTCTGATTCCAATGAATAAT ATTTCTGTG 
C CG C AAG AAGATG ATT ATGGG T ATCAGTG TTTGG AGGG CAAAG ATTG TG CCAGCTTCT 
TCTGTTG CTTTG AAGACTG C AG AACAGGATCTTGG AGGGAAGGAAGGAT ACACAT ACG 
CATTGCCAAAATTGACTCTTATTCTAGAATATTTTTCCCAACCGCTTTTGCCCTGTTC 
AACTTGGTTTATTGGGTTGGCTATCTTTACTTATAAAATCTACTTCATAAGCAAAAAT 
CAAAA 




ORF Start: ATG at 31 


ORF Stop: TAA at 1426 




SEQ ID NO: 8 


465 aa 


MW at 53597.3kD 


NOV2c, 

CG58520-03 Protein Sequence 


MGPLKAFLFSPFLLRSQSRGVRLVFliLTLHLGNWVDKADDEDDEDLTVNKTWVLAPK 
IHEGDITQILNSLLQGYDNKLRPDIGVRPTVIETDVYVNSIGPVDPINMEYTIDIIFA 
QTWFDSRLKFNSTMKVLMLNSNMVGKIWIPDTFFRNSRKSDAHWITTPNRLLRIWNDG 
RVLYTLRLTINAECYLQLHNFPMDEHSCPLEFSSDGYPKNEIEYKWKKPSVEVADPKY 
WRLYQFAFVGLRNSTEITHTISGDYVIMTIFFDLSRRMGYFTIQTYIPCILTVVLSWV 
S FW INKDAVPARTSLGI TTVLTMTTLSTI ARKS LPKVS YVTAMDLFVS VCF I FVFAAL 
MEYGTLHYFTSNQKGKTATKDRKLKNKASVTPGLHPGSTLI PMNNISVPQEDDYGYQC 
LEGKDCASFFCCFEDCRTGSWREGRIHIRIAKIDSYSRI FFPTAFALFNLVYWVGYLY 
L 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 2B. 
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Table 2B. Comparison of NOV2a against NOV2b through NOV2c. 


Protein Sequence 


NOV2a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV2b 


24..472 
27..475 


311/458(67%) 
352/458 (75%) 


NOV2c 


1..472 
1..465 


414/474 (87%) 
415/474 (87%) 



Further analysis of the NOV2a protein yielded the following properties shown in Table 
2C. v 



Table 2C. Protein Sequence Properties NOV2a 


Psort 
analysis: 


0.6400 probability located in plasma membrane; 0.4600 probability located 
in Golgi body; 0.3700 probability located in endoplasmic reticulum 
(membrane); 0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Likely cleavage site between residues 38 and 39 



A search of the NOV2a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 2D. 



Table 2D. Geneseq Results for NOV2a 


Geneseq 
Identifier 


Protein/Organism/Length 
[Patent #, Date] 


NOV2a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities 
for the 
Matched 
Region 


Expect 
Value 


AAM41007 


Human polypeptide SEQ ID NO 
5938 - Homo sapiens, 489 aa. 
[WO200153312-A1, 26-JUL- 
2001] 


24..472 
49..489 


334/451 l 

(74%) 
379/451 ! 

(83%) i 


0.0 


AAM39221 


Human polypeptide SEQ ED NO 
2366 - Homo sapiens, 467 aa. 
[WO200153312-A1, 26- JUL- 
2001] 


24..472 
27..467 


334/451 
(74%) 

379/451 
(83%) 


0.0 


AAR83968 


GABA-A receptor gamma-3 
subunit - Homo sapiens, 467 aa. 
[W09529234-A1, 02-NOV- 
1995] 


24..472 
5. .467 


300/472 i 

(63%) 
356/472 ■ 

(74%) ! 


e-169 
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AAW59048 


GABA-A receptor epsilon sub- 
unit related protein - Mammalia, 
506 aa. [DE 19644501 -A 1, 30- 
APR-1998] 


62..472 
70..506 


193/448 
(43%) 

274/448 
(61%) 


e-102 




Human GABA receptor epsilon 
subunit - Homo sapiens, 506 aa. 
[WQ9823742-A1, 04-JUN-1998] 


OZ..4 12. 
70..506 


1 /\0 /A AO 

193/448 

(43%) 
274/448 i 

(61%) ! 


e-102 



In a BLAST search of public sequence databases, the NOV2a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 2E. 



Table 2E. Public BLASTP Results for NO V2a 


Protein 
Number 


Ii UlClU/ V^I gdUlalll/XjCllglU 


NOV2a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities 
for the 
Matched 
Portion 


Expect 
Value 


P23574 


Gamma-aminobutyric-acid 
receotor ffamma-1 suhunit 
precursor (GABA(A) receptor) - 
Rattus norvegicus (Rat), 465 aa. j 


1..472 ! 
1 465 


426/475 

r89%i 

440/475 
(91%) 


0.0 


Q9R0Y8 


Gamma-aminobutyric-acid 
receptor gamma- 1 subunit 
precursor (GABA(A) receptor) - : 
Mus musculus (Mouse), 465 aa. 


1..472 
1..465 


420/477 
(88%) 

434/477 
(90%) 


0.0 


JH0824 


gamma-aminobutyric acid A 
receptor gamma 1 chain ! 
precursor - chicken, 464 aa. 


16.. 472 
12..464 


390/463 
(84%) 

416/463 
(89%) 


0.0 


JH0316 


gamma-aminobutyric acid A 
receptor gamma 2 chain 
alternatively spliced precursor - 
mouse, 466 aa. j 


24..472 
26.. 466 


336/451 
(74%) 

380/451 
(83%) 


0.0 


P18508 


Gamma-aminobutyric-acid 
receptor gamma-2 subunit 
precursor (GABA(A) receptor) - 
Rattus norvegicus (Rat), 466 aa. 


24..472 
26..466 


335/451 
(74%) 

379/451 
(83%) 


0.0 



PFam analysis predicts that the NOV2a protein contains the domains shown in the 
Table 2F. 
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Table 2F. Domain Analysis of NOV2a 


Pfam Domain 


NOV2a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


Neur_chanJLBD: 
domain 1 of 1 ; 


63..273 


66/271 (24%) 
162/271 (60%) i 


2.7e-56 


Cys-protease-3C: 
domain 1 of 1 


363..369 


4/7 (57%) 
6/7 (86%) 


5.2 


Neur_chan_memb : 
domain 1 of 1 


280..466 


44/297 (15%) 
164/297 (55%) 


1.2e-60 



Example 3. 

The NOV3 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 3 A. 



Table 3A. NOV3 Sequence Analysis 




SEQ ID NO: 9 


1440 bp 


NOV3a, 

CG585 18-01 DNA Sequence 


GAAGAGATGGTCCTGGCTTTCCAGTTAGTCTCCTTCACCTACATCTGGATCATATTGA 
AACCAAATGTTTGTGCTGCTTCTAACATCAAGATGACACACCAGCGGTGCTCCTCTTC 
AATGAAACAAACCTGGAAACAAGAAACTAGAATGAAGAAAGATGACAGTACCAAAGCG 
CGGCCTCAGAAATATGAGCAACTTCTC C AT ATAGAGG ACAACG ATTTCG CAATGAGAC 
CTGGATTTGGAGGTGAGTATTATCCTCTCAAAATTGGGTCTCCAGTGCCAGTAGGTAT 
AG ATG T CC ATG TTG AAAG C ATTG A CAG CATTT CAG AG ACTAACATGG T AAGTTTCT T C 
ATGGGATATGACTTTACAATGACTTTTTATCTCAGGCATTACTGGAAAGACGAGAGGC 
TCT CCTTT CCT AG CACAG CAAACAAAAG CATG A C ATTTG AT CAT AG ATTG AC CAG AAA 
GATCTGGGTGC CTGATATC TTTTTTGTCCACTCTAAAAG ATCCTTCATCCATG ATACA 
ACTATGGAGAATATCATGCTGCGCGTACACCCTGATGGAAACGTCCTCCTAAGTCTCA 
GGAGGATAACGGTTTCGGCCATGTGCTTTATGGATTTCAGCAGGTTTCCTCTTGACAC 
TCAAAATTCTTCTCTTGAACTGGAAAGCGCCTACAATGAGGATGACCTAATGCTATAC 
TGG AAACACGGAAACAAGT C CTTAAAT ACTG AAG AACATATGT CCCTTTCTCAGTTCT 
TCATTG AAGACTTCAGTG C ATCT AGTGGATTAG CTTTCTAT AG CAGCACAGGTTGGTA 
CAATAGGCTTTTCATCAACTTTGTGCTAAGGAGGCATGTTTTCTTCTTTGTGCTGCAA 
ACCTATTTCCCAGCCATATTGATGGTGATGCTTTCATGGGTTTCATTTTGGATTGACC 
GAAGAGCTGTTCCTGCAAGAGTTTCCCTGGGTGGAATCACCACAGTGCTGACCATGTC 
CACAATCATCACTGCTGTGAGCGCCTCCATGCCCCAGGTGTCCTACCTCAAGGCTGTG 
GATGTGTACCTGTGGGTCAGCTCCCTCTTTGTGTTCCTGTCAGTCATTGAGTATGCAG 
CTGTGAACTACCTCACCACAGTGGAAGAGCGGAAACAATTCAAGAAGACAGGAAAGGT 
ACAGATTTCTAGGATGTACAATATTGATGCAGTTCAAGCTATGGCCTTTGATGGTTGT 
TACCATGACAGCGAGATTGACATGGACCAGACTTCCCTCTCTCTAAACTCAGAAGACT 
TCATGAGAAGAAAATCGATATGCAGCCCCAGCACCGATTCATCTCGGATAAAGAGAAG 
AAAATCCCTAGGAGGACATGTTGGTAGAATCATTCTGGAAAACAACCATGTCATTGAC 
ACCTATTCTAGGATTTTATTCCCCATTGTGTATATCTTTATTTAATTT 




ORF Start: ATG at 7 


ORF Stop: TAA at 1435 




SEQ ID NO: 10 


476 aa 


MW at 55285.2kD 


NOV3a, 

CG585 18-01 Protein Sequence 


MVLAFQLVSFTYIWIILKPNVCAASNIKMTHQRCSSSMKQTWKQETRMKKDDSTKARP 
QKYEQLLH I EDNDFAMRPG FGGE YYPLKIGS P VP VGIDVHVES IDS I SETNMVS FFMG 
YDFTMTFYLRHYWKDERLS F PS TANKS MTFDHRLTRKI WVPD I FFVHSKRSFI HDTTM 
ENIMIJlVHPrXSNVLLSU^ITVSAMCFrmFSRFPI^TQNCSLELESAYNEDDUILYWK 
HGNKSIOTEEHMSLSQFFIEDFSASSGLAFYSSTGWYNRLFINFVLRRHVFFFVLQTY 
F PA I LMVMLS WVS FW I DRRA V P ARVS LGG I TTVLTMS T 1 1 TA VSASM PQVS YLKAVD V 
YLV^SSLFVFLSVIEYAAVNYLTTVEERKQFKKTGKVQISRKYNIDAVQAMAFDGCYH 
DSEIDMDQTSLSLNSEDFMRRKSICSPSTDSSRIKRRKSIiGGHVGRIILENNHVIDTY 
SRILFPIVYIFI 
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Further analysis of the NOV3a protein yielded the following properties shown in Table 

3B. 



Table 3B. Protein Sequence Properties NOV3a 


PSort 
analysis: 


0.6850 probability located in endoplasmic reticulum (membrane); 
0.6400 probability located in plasma membrane; 0.4600 probability 
located in Golgi body; 0.2400 probability located in nucleus 


SignalP 
analysis: 


Likely cleavage site between residues 25 and 26 



A search of the NOV3a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 3C. 



Table 3C. Geneseq Results for NOV3a 


Geneseq 
Identifier 


Protein/Organism/Length 
[Patent #, Date] 


NOV3a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities 
for the 
Matched 
Region 


Expect 
Value 


AAU04467 


Human gamma-amino butyric 
acid (GAB A) receptor protein #1 - 
Homo sapiens, 467 aa. 
[WO200153489-A1, 26-JUL- 
2001] 


1..474 i 
1..456 


454/475 
(95%) 

454/475 
(95%) 


0.0 


AAU04470 


Human gamma-amino butyric 
acid (GABA) receptor protein #4 - 
Homo sapiens, 420 aa. 
[WO200153489-A1, 26-JUL- 
2001] 


48..474 
1..409 


408/428 
(95%) 

408/428 
(95%) 


0.0 


AAU04468 


Human gamma-amino butyric 
acid (GABA) receptor protein #2 - 
Homo sapiens, 392 aa. 
[WO200153489-A1, 26-JUL- 
2001] 


1..393 
1..377 


370/394 
(93%) 

370/394 
(93%) 


0.0 


AAU04471 


Human gamma-amino butyric 
acid (GABA) receptor protein #5 - 
Homo sapiens, 345 aa. 
[WO200153489-A1, 26-JUL- 
2001] 


48..393 
1..330 \ 


324/347 
(93%) 

324/347 
(93%) 


e-180 


AAU04469 


Human gamma-amino butyric 


1..192 
1..177 


176/192 


2e-96 
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Homo sapiens, 180 aa. 
[WO200153489-A1, 26-JUL- 
2001] 




176/192 
(91%) 





In a BLAST search of public sequence databases, the NOV3a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 3D. 



Table 3D. Public BLASTP Results for NOV3a 


Protein 
Accession 

i. 1 U 111 U V- 1 


Pr otein/O rgan is m/Length 


NOV3a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities 

for the 
lviaicncQ 

Portion 


Expect 
Value 


P50573 


Gamma-aminobutyric-acid receptor 
rho-3 subunit precursor (GABA(A) 

r^rpntnr^ - Uattiic nnr\/f*cyir*iic fl?af^ 

464 aa. 


1..474 
1..453 


383/476 
(80%) 

(85%) 


0.0 


Q9YGQ2 j 


GAMMA-AMINOBUTYRIC- 
ACID RECEPTOR RHO-3 
SUBUNIT - Morone americana 
(White perch), 470 aa. 


1..474 
4..4S9 


293/485 
(60%) 

363/485 
(74%). 


e-153 


P50572 j 


Gamma-aminobutyric-acid receptor 
rho-1 subunit precursor (GAB A(A) 
receptor) - Rattus norvegicus (Rat), 
474 aa. 


49..474 
58,.463 


270/427 
(63%) 

317/427 
(74%) 


e-144 


P56475 ; 


Gamma-aminobutyric-acid receptor 
rho-1 subunit precursor (GAB A( A) 
receptor) - Mus musculus (Mouse), 
474 aa. 


49..474 
S8..463 


270/427 
(63%) 

317/427 
(74%) 


e-143 


P24046 j Gamma-aminobutyric-acid receptor 
rho-1 subunit precursor (GABA(A) 
J receptor) - Homo sapiens (Human), 
|473aa. 


49..474 
S7..462 


268/427 
(62%) 

317/427 
(73%) 


e-143 



PFam analysis predicts that the NOV3a protein contains the domains shown in the 
Table 3E. 



Table 3E. Domain Analysis of NOV3a 
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Pfam Domain 


NOV3a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect 
Value 


Neur chan LBD: domain 1 
ofl 


88..282 


70/250 (28%) 
165/250(66%) 


1.2e- 
54 


Neur chan memb: domain 
1 ofl 


289..475 


44/292(15%) 
141/292 (48%) 


7.6e- 
28 



Example 4. 

The NOV4 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 4A. 



Table 4A. NOV4 Sequence Analysis 




SEQIDNO: 11 


1587 bp 


NOV4a, 

CG585 16-01 DNA Sequence 


G AACAG AAATGAAT AAAAGTCG CTGG CAG AGTAG AAG ACGAC ATGGG AG AAGAAGCC A 
CCAGCAGAACCCTTGGTTCAGACTCCGTGATTCTGAAGACAGGTCTGACTCCCGGGCA 
G CACAG CC CGCTCACGATTC CGGCCACGGTG ATG ACG AGTCTCCGTCAAC CTCGTCTG 
GCACAG CTGGGACCTCCT CTGTGCCAGAGCT ACCTGGGTTTTACTTTG ACCCTGAAAA 
GAAACGCTACTTCCGCTTGCTCCCTGGACATAACAACTGCAACCCCCTGACGAAAGAG 
AGCATC CGG CAGAAGGAG ATGG AG AGCAAGAG ACTGCGGCTGCTCCAGG AAG AAG AC A 
GACGGAAAAAGATTGCCAGGATGGGATTTAATGCATCTTCCATGCTACGAAAAAGCCA 
GCTGGGTTTTCTCAACGTCACCAATTACTGCCATTTAGCCCACGAGCTGCGTCTCAGC 
TGCATGGAG AGGAAAAAGGTCCAGATTCG AAG CATGGATCCCTCCG CCTTGGCAAG CG 
ACCGATTTAACCTCATACTGGCAGATACCAACAGTGACCGGCTCTTCACAGTGAACGA 
TGTTACAGTTGGAGGCTCCAAGTATGGTATCATCAACCTGCAAAGTCTGAAGACCCCT 
ACGCTCAAGGTGTTCATGCCACGAAAACCTCCGATTCTCACCAACCGGAAGGTGAACA 
CTTCGGTGTGCTGGGCCTCGCTGAATCACTTGGATTCCCACATTCTGCTATGCCTCAT 
GGGACTCGCAGAGACTCCAGGCTGTGCCACCCTGCTCCCAGCATCACTGTTCGTCAAT 
AGTCCCCACCCAGGAATAGACCGGCCTGGCATGCTCTGCAGTTTCCGGATCCCTGGGG 
GTGCCTGGTCCTGTGCCTGGTCCCTGAATATCCAAGCAAATAACTGCTTCAGTACAGG 
CTTGTCTCGGCGGGTCCTGTTGACCAACGTGGTGACGGGACACCGGCAGTCCTTTGGG 
ACCAACAGTGATGTCTTGGCCCAGCAGTTTGCTCTCATGGCTCCTCTGCTGTTTAATG 
GCTGCCGCTCTGGGGAAATCTTTG CCATTG ATCTG CGTTGTGGAAATCAAGGCAAGGG 
ATGGAAGGCCACCCGCCTGTTTCATGATTCAGCAGTGACCTCTGTGCGGATCCTCCAA 
GATGAGCAATACCTGATGGCTTCAGACATGGCTGGAAAGATCAAGCTGTGGGACCTGA 
GGACCACGAAGTGCGTAAGGCAGTACGAAGGCCACGTGAATGAGTACGCCTACCTGCC 
CCTGCATGTGCACG AGGAAG AAGG AATC CTGGTGG CAGTGGG CC AGG ACTGCTAC ACG 
AGAATCTGGAGCCTCCACGATGCCCGCCTACTGAGAACCATACCCTCCCCGTACCCTG 
CCTCCAAGGCCGACATTCCCAGTGTGGCCTTCTCGTCGCGGCTGGGGGGCTCCCGGGG 
GCGCGCCGGGGCTGCTCATGGCTGTCGGGCAGGACCTTTACTGTTACTCCTACAGCTA 
ATTCTGCAGGGCACAGCCCAGAGCCATGTGGATTTGACTTACGGGAGTAAAGCGTAAC 
TTTTTACTGCATCTAATGAGG 




ORF Start: ATG at 9 


ORF Stop:TAA at 1563 




SEQ ID NO: 12 


518aa 


MW at 57769.3kD 


NOV4a, 

CG585 16-01 Protein Sequence 


MNKSRWQSRRRHGRRSHQQNPWFRLRDSEDRSDSRAAQPAHDSGHGDDESPSTSSGTA 
GTSSVPELPGFYFDPEKKRYFRLLPGHNNCNPLTKESIRQKEMESKRLRLLQEEDRRK 
KIARMGFWASSMLRKSQLGFLNVTlTYCHLAHELRLSCMERKKVQIRSMDPSAliASDRF 
NLILADTNSDRLFTVNDVTVGGSKYGI INliQSLKTPTLKVFMPRKPPI LTNRKVNTSV 
CWASLNHLDSHI LLCLMGLAETPGCATLLPASLFVNS PHPGIDRPGMLCSFRI PGGAW 
SCAWSLNIQANNCFSTGLSRRVLLTNVVTGHRQSFGTNSDVLAQQFALMAPLLFNGCR 
S G E I F AI DLRCGNQGKG WKATRL FHDS A VTS VR I LQD E QYLMAS DMAG K I KLWD LRTT 
KC VRQ YEGHVNE YAYL P LHVH E E EG I LVA VGQDC YTRI WS LHDARLLRT I PS P YPAS K 
AD I PSVAFS SRLGGSRGRAGAAHGCRAG PLLLLLQLI LQGTAQSHVDLTYGS KA 



Further analysis of the NOV4a protein yielded the following properties shown in Table 

4B. 
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Table 4B. Protein Sequence Properties NOV4a 


Psort ; 
analysis: ; 


0.9600 probability located in nucleus; 0.4776 probability located in 
mitochondrial matrix space; 0.3000 probability located in microbody 
(peroxisome); 0.1837 probability located in mitochondrial inner membrane 


SignalP j 
analysis; 


No Known Signal Sequence Predicted 



A search of the NOV4a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 4C. 



Table 4C. Geneseq Results for NOV4a 


Identifier 


r I Ulvlll/ WI gdlllalll/l^Cllglli |x alCill 

#,Date] 


NOV4a 

DpcSHii ac/ 

JVC 9 111 U Ca/ 

Match 
Residues 


Identities/ 

QimilaritiAC frit* 

the Matched 
Region 


Value 


ABB 11 794 


1-Tuman secreted nrotein horn olo true 
SEQ ID NO:2164 - Homo sapiens, 
500 aa. [WO200157188-A2, 09- 
AUG-2001] 


1..484 
5.. 485 


470/484 (97%) 
471/484 (97%) 


0 0 


AAM79804 


Human protein SEQ ID NO 3450 - 
Homo sapiens, 500 aa. 
[WO200157190-A2, 09-AUG- 
2001] 


1..484 
5..485 


470/484 (97%) 
471/484 (97%) 


0.0 


AAM41122 


Human polypeptide SEQ ID NO 
6053 - Homo sapiens, 500 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..484 I 
5.. 485 


470/484 (97%) 
471/484 (97%) 


0.0 


AAG67256 \ 


Amino acid sequence of a human 
liver-associated gene - Homo 
sapiens, 489 aa. [WO200109318- 
A1,08-FEB-2001] 


1..484 
1..474 


459/484 (94%) 
462/484 (94%) 


0.0 


AAB94587 \ 


Human protein sequence SEQ ID 
NO:15389 - Homo sapiens, 489 aa. 
[EP1 07461 7-A2, 07-FEB-2001] 


1..484 : 

1..474 


459/484 (94%) 
462/484 (94%) 


0.0 



In a BLAST search of public sequence databases, the NOV4a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 4D. 



Table 4D. Public BLASTP Results for NOV4a 
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Protein 
Accession 
Number 


Protein/Organism/Length 


NOV4a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities 
for the 
Matched 
Portion, 


Expect 
Value 


AAH18979 


HYPOTHETICAL 55.7 KDA 
PROTEIN - Homo sapiens 
(Human), 495 aa. 


1..484 
1..480 


470/484 (97%) 
471/484 (97%) 


0.0 


Q96K22 


CDNA FIJI 4839 FIS, CLONE 
OVARC 100 1791 - Homo sapiens 
(Human), 489 aa. 


1..484 ; 
1..474 : 


459/484 (94%) 
462/484 (94%) 


0.0 


Q9Y4P5 


HYPOTHETICAL 48.5 KDA 
PROTEIN - Homo sapiens 
(Human), 430 aa (fragment). 


5..435 i 
2..428 ■ 


420/431 (97%) 
421/431 (97%) 


0.0 


Q99LF7 


HYPOTHETICAL 58.1 KDA 
PROTEIN - Mus musculus 
(Mouse), 519 aa. 


1..484 \ 
1..481 i 


378/485 (77%) 
423/485 (86%) 


0.0 


Q9UFI0 


HYPOTHETICAL 26.0 KDA 
PROTEIN - Homo sapiens 
(Human), 234 aa (fragment). 


269..483 \ 
4..217 I 


175/215 (81%) 
193/215 (89%) 


4e-99 



PFam analysis predicts that the NOV4a protein contains the domains shown in the 
Table 4E. 



Table 4E. Domain Analysis of NOV4a 


Pfam Domain 


NOV4a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


WD40: domain 1 
of 3 


281.316 


2/37 (5%) 
26/37 (70%) 


5.8e+02 


WD40: domain 2 
of 3 


367..402 


10/37(27%) 
27/37 (73%) 


6.1 


WD40: domain 3 
of3 


408..446 


10/39(26%) 
23/39 (59%) 


13 



Example 5. 



The NOV5 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 5 A. 



Table 5A. NOV5 Sequence Analysis 
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SEQIDNO: 13 


1081 bp 


NOVSa, 

CG58473-01 DNA Sequence 


AGG ATOG CCCAGAAGGAGAAC AGTTATCCCTGG CCCTATGGCAAGCAG ACGGCTCCAG 
CCGGCCTGAGTACCCTGCTCCCGCGAGTCCTCCCGAGGATCCCCACCGAAGCTGCGCG 
TG AGCTCCCGAGCTGCG CAGACCC ACAGCCCGCAG CGGCC CCTGGCCATG AGGTGGT A 
GAGAACAGTTGTGGGAAGCGCAGCATCTTAACGCGGCCCTTCCTGGTCGACGACCTTG 
AGACTGGGCGTCCCCTGGGCAAAGACAAGTTTGTACATGTGTACTTGGCTCGAAAGAA 
GACAAG CCATTTCATCGTGG CCCTC AAGGCCTTCAAGT CTCAGATAGAGGAGGGCGTG 
GAGCACCAGATGCGCAGGCAGATGGAAATCCAGGCCCCCTTTCAGCATCCCAACATAT 
TGAGTCTCTACAACTATTTTTATGACCTGAGAAAAATCTACTGGATTCTAGAGTACGC 
CCCCGCCACCCCTACCCCCGAGGAGCTGTACCAGGAGCTGCGAAAGAGCCGCACCTTT 
G ACAAGAAGCCAAC AGCC ACCAT C ACGGGGGAGGTGGCAG ATG CTCTG ATGTACTGCC 
ACGGGAAG AAGGTG ACTCCCAG AG ACATG AAGC CAG ATAATCTACT CTCAGGGCTTG A 
GGGCGAGCTGAAAGTTGCCGACTTCGGCTGCCCTGTGCACGCCCCCTCACTGAGGAGG 
AAGAC AAGACAAATGTGTGG CACCCTGGACT AC CTGTCCCC AGAGACAATTGAGGGGC 
GCGCGCACACCGAGAAGGTGGATTTGTGGTACATCGGAGCACTCGGCTATGAGCCGCT 
GGTGGGGAACCCCACACACAATGAGGC CT ATGGGCG AATCGTCAAGGTGG C CCTAAAA 
TTCCCCCTTCTGTGCCCAGGAGAGCCCCAGGACCTCATCTCCAAGCTGCTTAGGCATA 
ACCCCTCAGAACGGCTGCCCCTGGCCCAGGTCTCAGCCCACCCTGGGATCCTGGCCCA 
TTCTCGGAGGGTTTTGCCTCCCTCTGCCCATCAGTCTGTCCCCTGGTGGTCCCTGACA 
TTCACT CGGGGGCGTCTGTGTTTGT AAGTCTG CATAT 




ORF Start: ATG at 4 


ORF Stop: TAA at 1069 




SEQDDNO: 14 


355 aa 


MWat40012.7kD 


NOV5a, 

CG58473-01 Protein Sequence 


MAQKENSYPWPYGKQTAPAGLSTLLPRVLPRIPTEAARELPSCADPQPAAAPGHEWE 
NSCGKRSILTRPFLVDDLETGRPLGKDKFVHVYLARKKTSHFIVALKAPKSQIEEGVE 
HQMRRQME I QAPFQH PNI LS L YNY F YD LRK I YW I LE Y APAT PTP E E L YQE LRKS RT PD 
KKPTATITGEVADALMYCHGKKVTPRDMKPDNLLSGLEGELFCVADFGCPVHAPSLRRK 
TRQMCGTLDYLSPETIEGRAHTEKVDLWYIGALGYEPLVGNPTHNEAYGRIVKVALKF 
PLLCPGEPQDLISKLLRHNPSERLPLAQVSAHPGILAHSRRVLPPSAHQSVPWWSLTF 
TRGRLCL 



Further analysis of the NOV5a protein yielded the following properties shown in Table 

5B. 



Table SB. Protein Sequence Properties NOVSa 


Psort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in 
microbody (peroxisome); 0.1897 probability located in lysosome (lumen); 
0.1000 probability located in mitochondrial matrix space 


SignalP • 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOVSa protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 5C. 



Table 5C. Geneseq Results for NOVSa 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOVSa 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG67615 


Amino acid sequence of a human 
protein - Homo sapiens, 344 aa. 
[WO200109316-A1, 08-FEB-2001] 


1..341 
1..343 


247/349(70%) 
274/349 (77%) 


e-129 
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AAG67436 


Amino acid sequence of a human 
polypeptide - Homo sapiens, 344 aa. 
[WO200109345-A1, 08-FEB-2001] 


1..341 
1..343 


247/349(70%) \ 
274/349(77%) ; 


e-129 


AAY22475 [ 


Human AUR1 protein sequence - 
Homo sapiens, 344 aa. 
[W09937788-A2, 29-JUL-1999] 


1..341 
1..343 


247/349(70%) ! 
274/349(77%) \ 


e-129 


AAW18083 


Human Aurora- 1 - Homo sapiens, 
344 aa. [WO9722702-A1 , 26-JUN- 
1997] 


1..341 
1..343 


247/349(70%) \ 
274/349(77%) \ 


e-129 


AAY2/U52 


Human protein kinase (HPKM)-1 
(clone ID 2940) - Homo sapiens, 
347 aa. [W09938981-A2, 05-AUG- 
1999] 


1..341 
1..346 


246/352 (69%) 
274/352(76%) 


e-127 



In a BLAST search of public sequence databases, the NOV5a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 5D. 



Table 5D. Public BLASTP Results for NOVSa 


Protein 
Accession 
Number 


Protein/Organism/Lengtb 


NOV5a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


060446 j 


AURORA-RELATED KINASE 2 
(SERINE/THREONINE KINASE 
12) - Homo sapiens (Human), 344 aa. 


1..341 
1..343 


247/349 (70%) 
274/349 (77%) 


e-128 


Q96GD4 1 


UNKNOWN (PROTEIN FOR 
MGC:1 1031) - Homo sapiens 
(Human), 344 aa. 


1..341 
1..343 | 


247/349 (70%) 
274/349 (77%) 


e-128 


Q96DV5 ! 


UNKNOWN (PROTEIN FOR 
MGC:4243) - Homo sapiens 
(Human), 345 aa. 


1..341 
1..344 


247/350 (70%) 
274/350 (77%) 


e-126 


Q9UQ46 


AIK2 - Homo sapiens (Human), 343 
aa. 


1..341 
1..342 


245/348 (70%) 
272/348(77%) 


e-126 


014630 


PROTEIN KINASE - Homo sapiens 
(Human), 347 aa. 


1..341 
1..346 


245/352 (69%) 
272/352 (76%) 


e-125 



PFam analysis predicts that the NOV5a protein contains the domains shown in the 
Table 5E. 



Table 5E. Domain Analysis of NOVSa 
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Pfam Domain 


NOVSa Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect 
Value 


Pkinase: domain 1 of 
1 


76..32S 


81/293 (28%) 
184/293 (63%) 


6.5e-36 



Example 6. 

The NOV6 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 6A. 



Table 6A. NOV6 Sequence Analysis 




SEQIDNO: 15 


1524 bp 


NOV6a, 

CGSR470-01 DNA ^emipnr^ 


AGCATTATOAACACTAATGACCTTAAACTCAGGTTGTCCAAAGCTGAGCAAGAACACC 
CACTACGTTTCTGGAATGAGCTTGAAGAAGCCCGACAGGTAGAACTTTATGCAGAGCT 
CCAGGCCAT CG ACTTTC AGGAACTG AACTTCTTTTT CCAAAAGG CCATTGAAGGATTT 
AACCAGTCCTCTCATCAAGAAAAGGTGGATGCGGGAATGGAACCTGTCCCTCGAGAAG 
T ACTGGGCAGTG CTGCAGGGAAGCT AG ATCAGCTCCAGGC CTGGGAAAGC AAAGTTTT 
CCAGATTTCTGAGAACAAAGTCACAGTTGTTCTAGCTGGTGGGCAGGGGACTAGACTC 
GTTGCATATCCAAAGGGGATGTATGATGTTGGTTTGCCATCCCATAAGACACTTTTTC 
AG ATT C AAGCAGAGCATAT C CTG AAG CT ACAACAG TTAG CTGAAAAAT ATT ATGG CAA 
CAAATGCATTATTCCATATTACGTCATGACCAGCGAGTTCACTCTGGGGCCCACGGCC 
GAGTTCTTCAGGGAGCACAACTTCTTCCACCTGGACCCCGCCAACGTGGTCATGTTTG 
AGCAGCGCCTG CTGCCTGCTGTGACCTTTG ATGGCAAGGTTATCCTGGAG CGGAAAG A 
CAAAGTTGCCATGGCCCCAGACGGCAACGGGGGCCTCTACTGCGCGCTGGAGGACCAC 
AAG AT C CTGGAGG ACATGG AG C G CCGGGG AG TGG AGTTTGTGCACGTGTA CTGTGTGG 
ACAACATCCTGGTGCGGCTGGCGGACCCTGTCTTCATCGGCTTCTGTGTGTTGCAGGG 
CGCAGACTGTGGCGCCAAGGTGGTGGAAAAGGCATACCCCGAGGAGCCCGTGGGCGTG 
GTGTGCCAGGTGGACGGTGTCCCCCAGGTGGTGGAGTACAGCGAGATCAGTCCTGAGA 
CCGCACAGCTACGTGTCTCCGACGGGAGCCTGCTGTACAATGCAGGCAACATCTGCAA 
CCACTTCTTCACCCGAGGCTTCCTTAAGGCGGTCACCAGGGAGTTTGAGCCTTTGCTG 
AAG C CACACGTG G CTGTG AAG AAGG TC CCG T ATG TGG ATG AGG AGGGG AATCTGG T AA 
AGCCGCTAAAACCGAACGGGATAAAGATGGAGAAGTTTGTGTTTGATGTGTTCCGGTT 
TGCTAAGAACTTTGCTGCCTTGGAAGTGCTGCGGGAGGAGGAATTTTCCCCACTGAAG 
AACGCAGAGCCAGCCGACAGGGACAGTCCCCGCACCGCTCGCCAGGCCCTGCTCACCC 
AGCACTACCGGTGGGCTCTGCGGGCCGGGGCCCGCTTCCTGGATGCCCATGGGGCCCG 
GCTCCCAGAGCTGCCCAGCTTGCCCCCAAATGGAGACCCTCCGGCCATCTGTGAGATA 
TCGCCCTTGGTGTCTTACTCTGGAGAGGGTTTAGAAGTGTACCTGCAAGGCCGGGAGT 
TCCAGTCCCCGCT CAT CCTGGATGAAGACCAGGCCAGGG AGCTGGTG AAAAATGGTAT 
ATGAACCTGATAC CAA 




ORF Start: ATG at 7 


ORF Stop: TGAat 1510 




SEQIDNO: 16 


501 aa 


MWat56461.0kD 


NOV6a, 

CG58470-01 Protein Sequence 


MNTNDLKLRLSKAEQEHPLRFWNELEEARQVELYAELQAIDFQELNFFFQKAIEGPNQ 
SSHQEKVDAGMEPVPREVLGSAAGKI^QLQAWESKVFQISENKVTVVLAGGQGTRLVA 
YPKGMYDVGLPSHKTLFQIQAEHILKLQQLAEKYYGNKCIIPYYVMTSEFTLGPTAEF 
FREHNFFHLDPANWMFEQRLLPAVTFDGKVILERKDKVAMAPDGNGGLYCALEDHKI 
LEDMERRGVEFVHVYCVDNILWLADPVFIGFCVLQGADCGAKVVEKAYPEEPVGWC 
QVDGVPQWEYSEISPETAQLRVSDGSLLYNAGNICNHFFTRGFLKAVTREFEPLLKP 
HVAVKKVPYVDEEGNLVKPLKPNGIKMEKFVFDVFRFAKNFAALEVLREEEFSPLKNA 
EPADRDSPRTARQALLTQHYRWALRAGARFLDAHGARLPELPSLPPNGDPPAICEISP 
L VS YSGEGLE VYLQGRE FQS PLI LD E DQ ARE L VKNG I 



Further analysis of the NOV6a protein yielded the following properties shown in Table 

6B. 



Table 6B. Protein Sequence Properties NOV6a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3490 probability located in 
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matrix space; 0. 1 000 probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV6a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 6C. 



Table 6C. Geneseq Results for NOV6a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


iNOVoa 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 




Human prostate cancer antigen protein 
sequence SEQ ID NO: 1538 - Homo 
sapiens, 524 aa. [WO200055174-A1, 
21-SEP-2000] 


1 CA1 
1..5U1 

3..S24 


353/522 (67%) 
413/522 (78%) 


0.0 




/vrduiQupsis mail ana. proiein iragmeni 
SEQ ID NO: 39067 - Arabidopsis 
thaliana, 502 aa. [EP 1 033405- A2, 06- ! 
SEP-2000] 


36..497 


1 )7Hl Hoy {J y /o) 

275/489(55%) ; 


Op OA 


AAG40236 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 49896 - Arabidopsis 
thaliana, 477 aa. [EP1033405-A2, 06- 
SEP-2000] 


9..485 
12..472 


193/488(39%) I 
272/488(55%) j 


3e-82 


AAG40235 ; 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 49895 - Arabidopsis 
thaliana, 500 aa. [EP 103 3405 -A2, 06- 
SEP-2000] 


9..485 
35..495 


193/488(39%) I 
272/488(55%) } 


3e-82 


AAG40234 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 49894 - Arabidopsis 
thaliana, 505 aa. [EP1033405-A2, 06- 
SEP-2000] 


9.. 485 
40..500 


193/488(39%) j 
272/488(55%) 


3e-82 



In a BLAST search of public sequence databases, the NOV6a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 6D. 



Table 6D. Public BLASTP Results for NOV6a 
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Protein 
Accession 
Number 


Protein/Organism/Length 


NOV6a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96GM2 


UDP-N-ACTEYLGLUCOSAMINE 
PYROPHOSPHORYLASE 1 - Homo 
sapiens (Human), SOS aa. 


1..501 
1..505 


351/505(69%) 
412/505 (81%) 


0.0 


Q16222 


UDP-N-acetylhexosamine 
pyrophosphorylase (Antigen X) (AGX) 
(Sperm- associated antigen 2) [Includes: 
UDP-N-acetylgalactosamine 
pyrophosphorylase (EC 2.7.7.-) (AGX- 

1) ; UDP-N-acetylglucosamine 
pyrophosphorylase (EC 2.7.7.23) (AGX- 

2) ] - Homo sapiens (Human), 522 aa. 


1..501 
1..522 ; 


352/522 (67%) 
412/522 (78%) 


0.0 


Q91YN5 j 


HYPOTHETICAL 58.6 KDA PROTEIN 
- Mus musculus (Mouse), 522 aa. 


1..501 j 
1..522 


342/522 (65%) 
407/522 (77%) 


0.0 


AAH17547 


HYPOTHETICAL 58.5 KDA PROTEIN 
- Mus musculus (Mouse), 521 aa. 


i..5oi ; 

1..521 


341/521 (65%) 
407/521 (77%) 


0.0 


Q9Y0Z0 


BCDNA:LD24639 PROTEIN - 
Drosophila melanogaster (Fruit fly), 520 
aa. 


6.. 492 
44..513 ! 


236/491 (48%) 
330/491 (67%) 


e-124 



PFam analysis predicts that the NOV6a protein contains the domains shown in the 
Table 6E. 



Table 6£. Domain Analysis of NO V6a 


Pfam Domain 


NOV6a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


UDPGP: domain 1 
ofl 


40.. 434 


108/428 (25%) 
324/428 (76%) 


8.4e-lll 



Example 7. 

The NOV7 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 7A. 



Table 7 A. NOV7 Sequence Analysis 




SEQIDNO: 17 461 bp 


NOV7a, 

CG58593-01 DNA Sequence 


ACGCAGAGATGCAGATCTTTGTGAAGACCCTCACGGGCAAGACCATCACCCTTGAGGT 
CAAGCCCACCGACACCATTCAGAATX3TCAAAACCAAAATTCAGGACAAGGAGGGTATC 
CCACCTG ACCAG CAG CGTCTG ATATTTGCTGGG AAACGGCTGGAGGATGGCCACACTC 
TCTCAGGCTACAACATCCAGAAAGAGTCCACCCTAAACCTGGTGCTGCGCCTGCGAGG 
TCGC^TTACTGAGCCTTCCCTC CG CCAG CT CGTCCAGAAAT ACAACTGCGACGAGATG 
ATCTGCTGCAAGTGCTATGCTTGCCTGCACCCCGGTGCTATCAACTGCCACAAGAAGA 
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AATGCGGCCACACCAACAACCTGTACCCCAGGAAGAAGGTCAAATAAGGCTCTTCCTT 
CCTTGAAGGGCAGCAGCCTTCTGCCCAGGCCCCATGGCCCTGGGGCCTCAATAAA 




ORF Start: ATG at 9 


ORF Stop: TAA at 393 




SEQIDNO: 18 


128 aa MW at 14540.9kD 


NOV7a, 

CG58593-01 Protein Sequence 


MQIFVKTLTGKTITLEVKPTDTIENVKTKIQDKEGIPPDQQRLIFAGKRLEDGHTLSG 
YNIQKESTLNLVLRLRGGITEPSLRQLVQKYNCDEMICCKCYACLHPGAINCHKKKCG 
HTNNLYPRKKVK 



Further analysis of the NOV7a protein yielded the following properties shown in Table 

7B. 



Table 7B. Protein Sequence Properties NOV7a 


PSort 
analysis: 


0.9800 probability located in nucleus; 0.1000 probability located in 
mitochondrial matrix space; 0.1000 probability located in lysosome 
(lumen); 0.0000 probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV7a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 7C. 



Table 7C. Geneseq Results for NOV7a 


Geneseq 
Identifier 


Protein/Organ ism/Length [Patent 
#, Date] 


NOV7a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities 
for the 
Matched 
Region 


Expect 
Value 


AAB52080 


Gene 16 human secreted protein 
homologous amino acid sequence 
#129 - Sus scrofa, 128 aa. 
[WO200061596-A1, 19-OCT-2000] 


1..128 
1..128 


111/128(86%) 
118/128(91%) 


7e-61 


AAG43861 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 54871 - Arabidopsis 
thaliana, 128 aa. [EP1033405-A2, 06- 
SEP-2000] 


1..128 ; 
1..128 


101/128 (78%) 
113/128(87%) 


9e-55 


AAG36188 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 44314 - Arabidopsis 
thaliana, 249 aa. [EP1033405-A2, 06- 
SEP-2000] 


1.128 i 
122..249 


101/128(78%) 
113/128(87%) 


9e-55 


AAG36187 


Arabidopsis thaliana protein fragment 


1..128 ; 
137..264 ; 


101/128 (78%) 
113/128 (87%) 


9e-55 
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thaliana, 264 aa. [EP1033405-A2, 06- 
SEP-2000] 








AAG36186 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 44312 - Arabidopsis 
thaliana, 322 aa. [EP1033405-A2, 06- 
SEP-2000] 


1..128 
195..322 


101/128 (78%) 
113/128 (87%) 


9e-55 



In a BLAST search of public sequence databases, the NOV7a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 7D. 



Table 7D. Public BLASTP Results for NOV7a 


Protein 
Accession 
Number 


Protein/Organism/Length 


inu vva 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9BX98 


UBIQUITIN A-52 RESIDUE 
RIBOSOMAL PROTEIN FUSION 
PRODUCT 1 - Homo sapiens 
(Human), 141 aa (fragment). 


1..128 
14..141 


111/128(86%) 
118/128(91%) 


3e-60 


Q9UPK7 


UBIQUITIN-52 AMINO ACID 
FUSION PROTEIN - Homo sapiens 
(Human), 128 aa. 


1..128 
1..128 


111/128 (86%) 
118/128(91%) 


3e-60 


Q9PT09 | 


UBIQUITIN - Oncorhynchus mykiss 
(Rainbow trout) (Salmo gairdneri), 128 
aa. 


1..128 
1..128 


110/128(85%) 
118/128(91%) 


6e-60 


042388 


UBIQUITIN-RIBOSOMAL 
PROTEIN FUSION PROTEIN - 
Gallus gallus (Chicken), 128 aa. 


1..128 
1..128 


110/128 (85%) 
117/128(90%) 


7e-60 


Q9XSV1 \ 


UBIQUITIN-RIBOSOMAL 
PROTEIN L40 FUSION PROTEIN - 
Canis familiaris (Dog), 128 aa. 


1..128 
1..128 


110/128(85%) 
117/128(90%) 


le-59 



PFam analysis predicts that the NOV7a protein contains the domains shown in the 
Table 7E. 



Table 7E. Domain Analysis of NOV7a 


Pfam Domain 


NOV7a Match 
Region 


Identities/ 
Similarities 
for the Matched Region 


Expect 
Value 


ubiquitin: domain 1 of 1 


1..74 


54/83 (65%) 
72/83 (87%) 


1.9e-38 
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Ribosomal_L40e: 


77..128 


30/52 (58%) 


7.3e-20 


domain 1 of 1 




42/52(81%) 





Example 8. 

The NOV8 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 8A. 



Table 8A. NOV8 Sequence Analysis 




SEQIDNO: 19 


2296 bp 


NOV8a, 

CG57871-01 DNA Sequence 


CGGCGGCGGCGGCAGTAGAAATGATGGAAGAATTGCATAGCCTGGACCCACGACGGCA 
GAAATTATTGGAGGCCAGGTTTACTGGAGTAGGTGTTAGTAAGGGACCACTTAATAGT 
GAGTCTTCCAACCAGAGCTTGTGCAGCGTCGGATCCTTGAGTGATAAAGAAGTAGAGA 
CTC CCAAGAAAAAGCAG AATGACCAGCGAAATCGGAAAAGAAAAG CTGAACCATATGA 
AAGTAGCCAAGGGAAAGGCACTCCTAGGGGACATAAAATTAGTGATTACTTTGAGTTT 
GCTGGGGGAAGCGGGCCGGGAACCAGCCCTGGCAGAAGTGTTCCACCAGTTGCACGAT 
CCTCACTGCAACATTCTTTATCCAATC CCTT ACCG CGACGAGTAGAACAGCCCCT CT A 
TGGTTTAGATGGCAGTGCTGCAAAGGAGGCAACGGAGGAGCAGTCTGCTCTGCCAACC 
CTCATGT CAGTG ATG CT AG CAAAAC CT CG G CTTG ACA CAGAG C AG CTGGCG CAAAGGG 
G AG CTGG CCT CTG CTTC ACTTTTGTTTCAG CT CAGCAAAACAG T CCCTC AT CT ACGGG 
ATCTGGCAACACAGAGCATTCCTGCAGCTCCCAAAAACAGATCTCCATCCAGCACAGA 
CAGACCCAGTCCGACCTCACAATAGAAAAAATATCTGCACTAGAAAACAGTAAGAATT 
CTGACTTAGAGAAGAAGGAGGGAAGAATAGATGATTTATTAAGAGCCATCTGTGATTT 
GAGACGGCAGATTGATGAACAGCAAAAGATGCTAGAGAAATACAAGGAACGATTAAAT 
AGATGTGTGACAATGAGCAAGAAACTCCTTATAGAAAAGTCAAAACAAGAGAAGATGG 
CGTCTAGAGATAAGAGCATGCAAGACCGCTTX3AGACTGGGCCACTTTACTACGTCTGA 
CCACGG AGCCAAATTTACTGAGCAG TGG ACAG ATGGTT ATGCTTTTr ARafiTPTT HTC 
AAGCAACAGGAAAGGATAAATTCACAGAGGGAAGAGATAGAAAGACAACGGAAAATGT 
TAGCAAAGCGGAAACCTCCTGCCATGGGTCAGGCCCCTCCTGCAACCAATGAGCAGAA 
ACAGTGGAAAAGCAAGACCAATGGAGCTGAAAATGAAACGTTAACGTTAAAAGAATAC 
CATGAACAAGAAGAAATCTTCAAACTCAGATTAGGTCATCTTAAAAAGGAGGAAGCAG 
AG ATC CAGG C AGAGCTGGAG AGGCTAG AAAGGGTTAG AAAACT ACATATCAGGGAAGT 
AAAAAGGATACATAATGAAGATAATTCACAATTTAAATATCATCCAACGCTAAATGAC 
AGATATTTGTTGTTACATCTTTTGGGTAGAGGAGGTTTCAGTGAAGTTTACAAGGCAT 
TTG ATCTAACAGAG CAAAG AT ACGT AG CTGTG AAAATTCACCAGTT AAATAAAAACTG 
G AG AGATGAG AAAAAGGAGAATTAC CACAAG CATG CATGTAGGG AATACCGGATTCAT 
AAAGAGCTGGACCATCCCAGAATAGTTAAGCTGTATGATTACTTTTCACTGGATACTG 
ACTCGTTTTGTACAGTATTAGAATACTGTGAGGGAAATGATCTGGACTTCTACCTGAA 
ACAGCACAAATTAATGTCAGAGAAAGAGGCCCGGTCCATTATCATGCAGATTGTGAAT 
GCTTTAAAGTACTTAAATGAAATAAAACCTCCCATCATACACTATGACCTCAAACCAG 
GTAATATTCTTTTAGAAAATGGTACAGCGTGTGGAGAGATAAAAATTACAGATTTTGG 
TCTTTCGAAGATCATGGATGATGATAG CTACAATTCAGTGG ATGG CATGGAGCT AACA 
T CAC AAGG TG CTGGT AC TT ATTGGT ATTT AC C AC C AG AG TG TTTTG TGGTTGGG AAAG 
AACC^CCAAAGATCTCAAATAAAGTTGATGTGTGGTCGGTGGGTGTGATCTTCTATCA 
GTGTCTTT ATGGAAGGAAG CCTTTTGGCC AT AACCAGTCTCAG CAAGACATC CTACAA 
G AG AATACG ATTCTTAAAG CTACTGAAGTG C AGTTCCCGCCAAAGCCGGTAGTAACAC 
CTGAAGCAAAGGCGTTGATTCGACGATGCTTGGCCTACCGAAAGGAGGACCGCATTGA 
TGTCCAGCAGCTGGCCTGTGATCCCTACTTGTTGCCTCACATCCGAAAGTCAGTCTCT 
ACGAGTAG CCCTGCTGGAGCTGCTATTGCATCAAC CTCTGGGGCGTCCAAT AACAGTT 
CTTCTAATTGAGACTGACTCCAAGGCCACAAACT 




ORF Start: ATG at 24 


ORF Stop: TGA at 2271 




SEQ ID NO: 20 


749 aa 


MW at 85415.8kD 


NOV8a, 

CG57871-01 Protein Sequence 


MEELHSLDPRRQKLLEARFTGVGVSKGPLNSESSNQSLCSVGSLSDKEVETPKKKQND 
QRNRKRKAEPYESSQGKGTPRGHKISDYFEFAGGSGPGTSPGRSVPPVARSSLQHSLS 
NPLPRRVEQPLYGLDGSAAKEATEEQSALPTLMSVMLAKPRLDTEQLAQRGAGLCFTF 
VSAQQNSPSSTGSGNTEHSCSSQKQISIQHRQTQSDLTIEKISALENSKNSDLEKKEG 
RI DDLLRA I CDLRRQI DEQQKMLEKYKERLNRCVTMSKiaL I E KSKQE KMACRDKSMQ 
DRLRI/3HFTTSDHGAKFTEQWTDGYAFQNLI KQQERINSQREE I ERQRKMLAKRKPPA 
MGQAP PATNEQKQWKSKTNGAENETLTLKE YHEQE E I FKLRIjGHLKKEEAE I QAELER 
LERVRKLH IREVKR IHNEDNSQFKYH PTLNDRYLLLHLLGRGGFS E VYKAFDLTEQRY 
VAVKIHQLNKNWRDEKKEMYHKHACREYRIHKELDHPRIVKLYDYFSLDTDSFCTVLE 
YCEGNDLDFYLKQHKLMSEKEARSIIMQIVNALKYLNEIKPPIIHYDLKPGNILLENG 
TACGEIKITDFGLSKIMDDDSYNSVDGMELTSQGAGTYWYLPPECFWGKEPPKISNK 
VD VWSVGVI FYQCLYGRKPFGHNQSQQDI LQENTI LKATEVQFP PKPWT PEAKALIR 
RCLAYRKEDRIDVQQLACDPYLLPHIRKSVSTSSPAGAAIASTSGASNNSSSN 
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Further analysis of the NOV8a protein yielded the following properties shown in Table 

8B. 



Table 8B. Protein Sequence Properties NOV8a 


PSort 
analysis: 


0.9600 probability located in nucleus; 0.1000 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0000 probability 
located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV8a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 8C. 



Table 8C. Geneseq Results for NO V8a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV8a j 
Residues/ \ 

Match 
Residues j 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM39278 


Human polypeptide SEQ ID NO 
2423 - Homo sapiens, 718 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..749 
2..718 


703/749 (93%) 
707/749 (93%) 


0.0 


AAM41064 


Human polypeptide SEQ ID NO 
5995 - Homo sapiens, 809 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..749 
92..809 J 


695/750(92%) 
701/750(92%) 


0.0 


AAR76062 


Protein kinase PKU beta - Homo 
sapiens, 540 aa. [JP07 132093- A, 23- 
MAY-1995] 


210..749 i 
1..540 


525/540 (97%) 
527/540 (97%) 


0.0 


AAR76061 


Protein kinase PKU alpha - Homo 
sapiens, 787 aa. [JP07 132093- A, 23- 
MAY-1995] 


1..744 j 
49..783 ! 


537/794 (67%) 
592/794 (73%) 


0.0 


ABB20910 


Protein #2909 encoded by probe for 
measuring heart cell gene expression 
- Homo sapiens, 404 aa. 
[WO200157274-A2, 09-AUG-2001] 


346..749 ! 
1..404 j 


404/404 (100%) 
404/404 (100%) 


0.0 



In a BLAST search of public sequence databases, the NOV8a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 8D. 
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Table 8D. Public BLASTP Results for NOV8a 


Protein 

M. 1 V IvlU 

Accession 
Number 


Protein/Organism/Lengtb 


NOV8a 
Residues/ 
Match 


Identities/ 
Similarities for 
the Matched 

X Ul IIUU 


Expect 
Value 


Q9UKI7 


TOUSLED-LIKE KINASE 2 - 
Homo sa.ni en** fHumanl 749 aa 


1..749 
1 749 


731/749 (97%) 
736/749 f97%1 

/ JO/ / *t yy i /o j 


0.0 


055047 


TOUSLED-LIKE KINASE - Mus 
musculus (Mouse), 71 7 aa. 


1..749 

1..717 ; 


699/749 (93%) 
705/749(93%) 


0:0 


Q9Y4F7 


PKU-ALPHA - Homo sapiens 
(Human), 719 aa (fragment). 


1..749 
3..719 j 


700/749(93%) 
705/749 (93%) 


0.0 


Q9D5Y5 


TOUSLED-LIKE KINASE 2 
(ARABIDOPSIS) - Mus musculus 
(Mouse), 696 aa. 


1..656 
1.656 


629/656(95%) 
640/656 (96%) 


0.0 


Q90ZY7 


PKU-ALPHA PROTEIN KINASE - \ 
Brachydanio rerio (Zebrafish) 
(Zebra danio), 697 aa. 


1..749 j 
2..697 


580/753 (77%) 
626/753(83%) 


0.0 



PFam analysis predicts that the NOV8a protein contains the domains shown in the 
Table 8E. 



Table 8E. Domain Analysis of NOV8a 


Pfam Domain 


NO V8a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


A2M: domain 1 of 1 


501..523 


10/23 (43%) 
20/23 (87%) 


4.6 


Pkinase: domain 1 of 
1 


439..718 


96/316(30%) 
213/316(67%) 


5.4e-70 



Example 9. 

The NOV9 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 9A. 



Table 9A. NOV9 Sequence Analysis 




SEQIDNO:21 2060 bp 


NOV9a, 

CG58590-01 DNA Sequence 


G TTTT CAT AG AT AAC CAT G AC AA CATC C CAT ATG AATGG G C ATGTT AC AG AGG AAT C A 
G AC AG CGAAGTAAAAAATGTTGATCTTGCATCACCAG AGG AACATC AG AAG CACCGAG 
AG ATGGCTGTTGACTGC C CTGG AGATTTGGG C AC C AGG ATGATGCCAAT ACGT CG AAG 
TGC ACAGTTGG AGCGTATTCGG C AACAACAGG AGG ACATG AGG CGTAGG AG AG AGGAA 
GAAGGGAAAAAGCAAGAACTTGACCTTAATTCTTCCATGAGACTTAAGAAACTAGCCC 
AAATTCCTCCAAAGACCGGAATAGATAACCCTATGTTTGATACAGAGGAAGGAATTGT 
CTTAGAAAGTCCTCATTATCCTCTGAAAATATTAGAAATAGAAGACTTGTTTTCTTCA 
CTT AAACAT ATC CAAC AT ACTTTGG T AG ATTCTC AG AG C CAGG AGG AT ATTTCAC TG C 
TTTTACAACTTGTTCAAAATAAGGATTTCCAGAATGCATTTAAGATACACAATGCCAT 
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CACAGTACACATGAACAAGGCCAGTCCTCCATTTCCTCTTATCTCCAACGCACAAGAT 
CTTG C TC AAG AGGT ACAAACT G TTTTG AAG CCAG TT CATCAT AAGG AAGG AC AAG AAC 
TAACTGCTTTGCTGAATACTCCACATATTCAGGCACTTTTACTGGCCCACGATAAGGT 
TGCTGAGCAGGAAATGCAGCTAGAGCCCATTACAGATGAGAGAGTTTATGAAAGTATT 
GG CCAGTATGGAGGAGAAACTGT AAAAATAGTTCGTATAGAAAAGG CTCGTGAT ATTC 
CGTTGGGTGCTACAGTTCGTAATGAAATGGACTCTGTCATCATTAGCCGGATAGTAAA 
AGGGGGTGCTGCAGAGAAAAGTGGTCTGTTGCATGAAGGAGATGAAGTTCTAGAGATT 
AATGGCATTGAAATT CGGGGG AAAGATGT CAATGAGGTTTTTGACTTGTTGTCTGATA 
TG CATGGTACTTTGACTTTTGTCCTG ATTC CCAG TCAACAG AT CAAGCCGCCTCCTGC 
CAAGGAAACAGTAATCCATGTAAAAGCTCATlTrGACTATGACCCCTCAGATGACCCT 
TATGTTCCATGTCGAGAGTTAGGTCTGTCTTTTCAAAAAGGTGATATACTTCATGTGA 
TCAGTCAAGAAGATCCAAACTGGTGGCAGGCCTACAGGGAAGGGGACGAAGATAATCA 
ACCTCTAG CCGGG CTTGTTCCAGGG AAAAGCTTTCAGCAGCAAAGGG AAGCCATG AAA 
CT^CCATAGAAGAAGATAAGGAGCCAGAAAAATCAGGTAAACTCTGGTGTGCAAAGA 
AGAATAAAAAGAAGAGGAAAAAGGTTTTATATAATGCCAATAAAAATGATGATTATGA 
CAACGAGGAGATCTTAACCTATGAGGAAATGTCACTTTATCATCAGCCAGCAAATAGG 
AAGAGAC CT ATCATCTTG ATTGGTCCACAG AACTGTGGCCAG AATG AATTG CGTCAG A 
GGCT CATGAACAAAGAAAAGGACCGCTTTG CATCTGCAGTTCCT CGTACAACCCGGAG 
TAGG CGAG ACCAAGAAGTAGCCGGTAGAG ATTACCACTTTGTTT CGCGGCAAGCATTC 
GAGGCAGACATAGCAGCTGGAAAGTTCATTGAGCATGGTGAATTTGAGAAGAATTTGT 
ATGGAACTAGCATAGATTCTGTACGGCAAGTGATCAACTCTGGCAAAATATGTCTTTT 
AAGTCTTCGTACACAGTCATTGAAGACTCTCCGGAATTCAGATTTGAAACCATATATT 
ATCTT CATTG CACCCC CTTCACAAG AAAGACTT CGGGCATTATTGG C C AAAG AAGG CA 
AGAATCCAAAGCCTGAAGAGTTGAGAGAAATCATTGAGAAGACAAGAGAGATGGAGCA 
GAACAATGGCCACTACTTTGATACGGCAATTGTGAATTCCGATCTTGATAAAGCCTAT 
CAGGAATTGCTTAC^TTAATTAACAAACTTGATACTGAACCTCAGTGGGTACCATCCA 
CTTGGCTG AGG T G AAAG AAA CAT C C AT TCT 




ORF Start: ATG at 17 


ORF Stop: TGA at 2042 




SEQ ID NO: 22 


675 aa MW at 77311.8kD 


NOV9a, 

CG58590-01 Protein Sequence 


MTTSHMNGHVTEESDSEVKNVDLASPEEHQKHREMAVDCPGDIiGTRMMPIRRSAQLER 
IRQQQEDMRRRREEEGKKQELDLNSSMRLKKLAQIPPKTGIDNPMFDTEEGIVLESPH 
YAVKILEIEDLFSSLKHIQHTLVDSQSQEDISLLLQLVQNKDFQNAFKIHNAITVHMN 
KASPPFPLISNAQDIAQEVCTVLKPVHHKEGQELTALLNTPHIQALLLAHDKVAEQEM 
QLEPITDERVYES IGQYGGETVKI VRI EKARDI PLGATVRNEMDS VI I SRI VKGGAAE 
KSGLLHEGDEVLE INGI EIRGKDVNEVFDLLSDMHGTLTFVLI PSQQI KPPPAKETVI 
HVKAH FD YD PS DD PYVPCRELGLS FQKGD I LHVI S QED PNWWQA YREGDEDNQ P LAGL 
VPGKS FQQQREAMKQTI EEDKE PEKSGKLWCAKKNKKKRKKVLYNANKNDD YDNEE I L 
TYEEMSL YHQPANRKRPI I LIGPQNCGQNELRQRLMNKEKDRFASAVPRTTRSRRDQE 
VAGRDYHFVSRQAFEADIAAGKFIEHGEFEKNLYGTSIDSVRQVINSGKICLLSLRTQ 
S LKTLRNSDLKPY 1 1 FI APPSQERLRALLAKEGKNPKPEELRE 1 1 EKTREMEQNNGHY 
FDTAI VNSDLDKAYQELLRLINKLDTE PQWVPSTWLR 




SEQ ID NO: 23 


2030 bp 


NOV9b, 

CG58590-02 DNA Sequence 


CCATGACAACATCCCATATGAATGGGCATGTTACAGAGGAATCAGACAGCGAAGTAAA 

AAATGTTGATCTTGCATCAC(^GAGGAACATCAGAAGCACCGAGAGATGGCTGTTGAC 

TGCCCTGGAGATTTGGGCACCAGGATGATGCCAATACGTCGAAGTGCACAGTTGGAGC 

GTATTCGGCAACAACAGGAGGACATGAGGCGTAGGAGAGAGGAAGAAGGGAAAAAGCA 

AGAACTTGACCTTAATTCTTCCATGAGACTTAAGAAACTAGCCCA^TTCCTCCAAAG 

ACCGGAATAGATAACCCTATGTTTGATACAGAGGAAGGAATTGTCTTAGAAAGTCCTC 

ATT ATGCTG TG AAAAT ATT AG AAATAG AAG ACTTG TTTT CTTC ACTT AAACAT AT C C A 

ACATACTTTGGTAGATTCTCAGAGCCAGGAGGATATTTCACTGCTTTTACAACTTGTT 

CAAAATAAGGATTTCCAGAATGCATTTAAGATACACAATGCCATCACAGTACATATGA 

ACAAGGCCAGTCCTCCATTTCCTCTTATCTCCAACGCACAAGATCTTGCTCAAGAGGT 

ACIAAACTG TTTTG AAGCCAGTTCATCATAAC^AAGGACAAGAACTAACTGCTTTG 

AATACTCCACATATTCAGGCACTTTTACTGGCCCACGATAAGGTTGCTGAGCAGGAAA 

TGCAGCTAGAGCCCATTACAGATGAGAGAGTTTATGAAAGTATTGGCCAGTATGGAGG 

AGAAACTGTAAAAATAGTTCGTATAGAAAAGGCTCGTGATATTCCGTTGGGTGCTACA 

G TT CGT AATG AAATGG ACT CTGT CATCATT AGC CGG AT AGT AAAAGGGGG TGC TG C AG 

AGAAAAGTGGTCTGTTGCATGAAGGAGATGAAGTTCTAGAGATTAATGGCATTGAAAT 

TCGGX3GGAAAGATGTCAATGAGGTTTTTGACCITGTTGTCTGATATGCATGGTACTTTG 

ACTTTTGTCCTGATTCCCAGTCAACAGATCAAGCCGCCTCCTGCCAAGGAAACAGTAA 

TCCATGT AAAAG CT C ATTTTG ACT ATG AC C CCT CAG ATG ACCCTT ATG TTCC ATGTCG 

AGAGTTAGGTCTGTCTTTTCAAAAAGGTGATATACTTCATGTGATCAGTCAAGAAGAT 

CC AAACTGGTGG CAGGCCTACAGGGAAGGGGACGAAG ATAATCAACCTCT AG CCGGGC 

TTG TT C CAGGG AAAAG CTTT C AG CAG CAAAGGG AAG C CATG AAA C AAACCAT AG AAG A 

AGATAAGGAG CCAG AAAAATCAGGAAAACTGTGGTCTG CAAAG AAGAATAAAAAGAAG 

AGGAAAAAGGTTTTATATAATGCCAAXAAAAATGATGATTATGACAACGAGGAGATCT 

TAACCTATGAGGAAATGTCACTTTATCATCAGCCAGCAAATAGGAAGAGACCTATCAT 

CTTGATTGGTCCACAGAACTGT^GCCAGAATGAATTGCGTCAGAGGCTCATGAACAAA 

GAAAAGG ACQ3CTTTGCATCTG CAGTTC CTC AT ACAACCCGG AGT AGGCGAG ACC AAG 

AAGT AG C CGG TAG AG ATT ACC A CTTTG TTT CGCGG CAAG CATTCG AGGCAG ACAT AG C 

AGCTGGAAAGTTCATTGAGCATGGTGAATTTGAGAAGAATTTGTATGGAACTAGCATA 

GATTCTCTACGGCAAGTGATCAACTCTGGCAAAATATGTCTTTTAAGTCTTCGTACAC 

AG TCATTG AAG ACTCTC CGG AATTC AG ATTTG AAACCAT AT ATT ATCTT CATTG CAC C 
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CCCTTCACAAGAAAGACTTCGGGCATT ATTGGCCAAAGAAGGCAAG AATC CAAAG CCT 
G AAG AG TTG AG AG AAATCATTG AG AAG ACAAG AG AGATGG AG C AG AA CAATGG CC AC T 
ACTTTG ATACGGCAATTGTGAATTCCG ATCTTGATAAAGC CT AT CAGGAATTGCTTAG 
GTTAATTAACAAACTTGATACTGAACCTCAGTGGGTACCATCCACTTGGCTGAGGTGA 




ORF Start: ATG at 3 j 


ORF Stop: TGA at 2028 




SEQ ID NO: 24 


675 aa MW at 77292.8WD 


NOV9b, 

CG58590-02 Protein Sequence 


OTTSHMNGHVTEESDSEVKNVDLASPEEHQKHREMAVDCPGDIX3TRMMPIRRSAQLER 
IRQQQEDMRRRREEEGKKQELDLNSSMRLKKLAQIPPKTGIDNPMFDTEEGIVLESPH 
YAVKILEIEDLFSSLKHIQHTLVDSQSQEDISLLLQLVQNKDFQNAFKIHNAITVHMN 
KAS PPFPLI SNAQDLAQEVQTVLKPVHHKEGQELTALLNTPH IQALLLAHDKVAEQEM 
QLEPITDERVYESIGQYGGETVKIVRIEKARDIPIiGATVRNEMDSVIISRIVKGGAAE 
KSGLLH EGDE VLiE I NG I E I RGKDVNEVFD LLS DMHGTLT FVL I P SQQ I KP P P AKETV I 
HVKAHFDYDPSDDPYVPCRELGLSFQKGDILHVISQEDPNWWQAYREGDEDNQPLAGL 
VPGKSFQQQREAMKQTIEEDKEPEKSGKLWCAKKNKKKRKKVLYNANKNDDYDNEEIL 
TYEEMSLYHQPANRKRPI I L I G PQNCGQNELRQRLMNKEKDRFASA V PHTTRSRRDQE 
VAGRDYHFVSRQAFEADI AAGKFI EHGE FEKKTLYGTS I DS VRQVINSGKI CLLS LRTQ 
SLKTLRNSDLKPYI IFIAPPSQERLRALLAKEGKNPKPEELREI IEKTREMEQNNGHY 
FDTAIVNSDLDKAYQELLRLINKLDTEPQWVPSTWLR 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 9B. 



Table 9B. Comparison of NOV9a against NOV9b. 


Protein 
Sequence 


NOV9a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV9b 


1..675 
1..675 


636/675 (94%) 
636/675 (94%) 



Further analysis of the NOV9a protein yielded the following properties shown in Table 

9C. 



Table 9C. Protein Sequence Properties NOV9a 


PSort 
analysis: 


0.7000 probability located in nucleus; 0.1000 probability located in 
mitochondrial matrix space; 0.1000 probability located in lysosome (lumen); 
0.0000 probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV9a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 9D. 



Table 9D. Geneseq Results for NOV9a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV9a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 
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AAB94180 


Human protein sequence SEQ ID 
NO: 14494 - Homo sapiens, 503 aa. 
[EP1074617-A2, 07-FEB-2001] 


173..675 
1..503 


501/503 (99%) 
501/503 (99%) 


0.0 


AAB41921 


Human ORFX ORF1685 polypeptide 
sequence SEQ ID NO:3370 - Homo 
sapiens, 269 aa. [WO200058473-A2, 
05-OCT-2000] 


406..675 
1..269 


261/270 (96%) 
264/270 (97%) 


e-147 


AAU07123 


Human novel human protein, NHP 
#23 - Homo sapiens, 576 aa. 
[WO200161016-A2, 23-AUG-2001] 


143..674 
31.. 574 


224/564 (39%) 
339/564 (59%) 


e-109 


AAU07119 


Human novel human protein, NHP 
#19 - Homo sapiens, 560 aa. 
[WO200161016-A2, 23-AUG-2001] 


143..654 
31..554 


213/544(39%) | 
327/544(59%) 


e-102 


AAU07115 


Human novel human protein, NHP 
#15 - Homo sapiens, 520 aa. 
[WO200161016-A2, 23-AUG-2001] 


143..606 
31..495 


196/481 (40%) | 
300/481 (61%) 


5e-97 



In a BLAST search of public sequence databases, the NOV9a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 9E. 



Table 9E. Public BLASTP Results for NOV9a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV9a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities 
for the 
Matched 
Portion 


Expect 
Value 


Q9JLB2 


PALS1 - Mus musculus (Mouse), 
675 aa. 


1..675 
1..675 


652/675 (96%) 
665/675 (97%) 


0.0 


Q9H9Q0 


CDN A FIJI 26 15 FIS, CLONE 
NT2RM4001629, WEAKLY 
SIMILAR TO MAGUK P55 
SUBFAMILY MEMBER 3 - Homo 
sapiens (Human), 503 aa. 


173..675 
1..503 


501/503 (99%) 
501/503 (99%) 


0.0 


AAL40935 \ 


STARDUST PROTEIN MAGUK 1 
ISOFORM - Drosophila 
melanogaster (Fruit fly), 1289 aa. 


252..674 
829.. 1282 


252/460 (54%) 
327/460 (70%) 


e-140 


Q9W3H6 ! 


CG1617 PROTEIN - Drosophila 
melanogaster (Fruit fly), 794 aa. 


252..674 
294..787 


252/500 (50%) 
327/500 (65%) 


e-132 


Q9W7F1 i 


P55-RELATED MAGUK 
PROTEIN DLG3 - Brachydanio 
rerio (Zebrafish) (Zebra danio), 576 
aa. 


142..673 | 
30..573 


209/556 (37%) 
335/556 (59%) 


e-105 
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PFam analysis predicts that the N0V9a protein contains the domains shown in the 
Table 9R 



Table 9F. Domain Analysis of NOV9a 


Pfam Domain 


NOV9a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


L27: domain 1 of 1 


186..238 


19/56 (34%) 
39/56 (70%) 


0.049 


PDZ: domain 1 of 1 


2S6..335 


21/83(25%) 
58/83 (70%) 


9.7e-12 


SH3: domain 1 of 1 


348..415 


19/68 (28%) 
46/68 (68%) 


0.026 


Guanylate kin: domain 1 1 
ofl 


515..624 


54/113(48%) 
87/113(77%) 


6.2e-38 


Peptidase SI 5: domain 1 
ofl 


642..658 


6/17 (35%) 
13/17(76%) 


8.2 


Caulimo mov: domain 1 
ofl 


420..673 


59/335 (18%) 
156/335 (47%) 


6.1 



Example 10. 



The NOV 10 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 10A. 



Table 10A. NOV10 Sequence Analysis 




SEQ ID NO: 25 


576 bp 


NOVlOa, 

CG58572-01 DNA Sequence 


ACCTTACTAGAAAAATGAAACCTGATGAAACTCCTATGTTTGACCCAAGTCTACTCAA 
AGAAGTGGACTGGAGTCAGAATACAGCTACATTTTCTCCAGCCATTTCCCCAACACAT 
CCTGGAGAAGGCTTGGTTTTGAGGCCTCTTTGTACTGCTGACTTAAATAGAGGTTTTT 
TTAAGGTATTGGGTCAGCTAACAGAGACTGGAGTTGTCAGCCCTGAACAATTTATGGA 
ATCTTTTGAGCATATGAAGAAATCTGGGGATTATTATGTTACAGTTGTAGAAGATGTG 
ACTCTAGGACAGATTGTTGCTACGGCAACTCTGATTATAGAACATAAATTCATCCATT 
CCTGTGCTAAGAGAGGAAGAGTAGAAGATGTTGTTGTTAGTGATGAATGCAGAGGAAA 
GCAGCTTGGCAAATTGTTATTATCAACCCTTACTTTGCTAAGCAAGAAACTGAACTGT 
TACAAGATTACCCTTGAATGTCTACCACAAAATGTTGGTTTCTATAAAAAGTTTGGAT 
ATACTGTATCTGAAGAAAACTACATGTGTCGGAGGTTTCTAAAGTAAAAATCTT 




ORF Start: ATGat 15 


ORF Stop: TAA at 567 




SEQ ID NO: 26 


184 aa MW at 20749.9kD 


NOVlOa, 

CG58572-01 Protein Sequence 


MKPDETPMFDPSLLKEVDWSQNTATFSPAISPTHPGEGLVLRPLCTADLNRGFFKVLG 
QLTETG WS PEQFMESFEHMKKSGD YYVTWEDVTLGQI VATATLI I EHKF I HSCAKR 
GRVEDWVSDECRGKQLGKLLLSTLTLLSKKLNCYKITLECLPQNVGFYKKFGYTVSE 
ENYMCRRFLiK 
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SEQ ID NO: 27 


560 bp 


NOVlOb, 

CG58572-02 DNA Sequence 


ATGAAACCTGATGAAACTCCTATGTTTGACCCAAGTCTACTCAAAGAAGTGGACTGGA 
GTCAGAATACAGCTACATTTTCTCCAGCCATTTCCCCAACACATCCTGGAGAAGGCTT 
GGTTTTGGGGCCTCTTTGTACTGCTGACTTAAATAGAGGTTTTTTTAAGGTATTGGGT 
CAGCTAACAGAGACTGGAGTTGTCAGCCCTGAACAATTTATGAAATCTTTTGAGCATA 
TGAAGAAATCTGGGGATTATTATGTTACAGTTGTAGAAGATGTGACTCTAGGACAGAT 
TGTTGCTACGGCAACTCTGATT AT AGAACATAAATTCATCCATT CCTGTG CTAAGAGA 
GGAAGAGTAGAAGATGTTGTTGTTAGTGATGAATGCAGAGGAAAGCAGCTTGGCAAAT 
TG TT ATT AT CAACCCTTACTTTGCT AAG CAAGAAACTG AA CTGTT ACAAG AT T ACC CT 
TGAATGTCTACCACAAAATGTTGGTTTCTATAAAAAGTTTGGATATACTGTATCTGAA 
GAAAACTACATGTGTCGGAGGTTTCTAAAGTAAAAATC 




ORF Start: ATG at 1 


ORF Stop: TAA at 553 




SEQ ID NO: 28 


184 aa MW at 20649. 8kD 


NOVlOb, 

CG58572-02 Protein Sequence 


MKPDETPMFDPSLLKEVDWSQNTATFSPAISPTHPGEGLVLGPLCTADIiNRGFFKVLiG 
QLTETGWS PEQFMKSFEHMKKSGDYYVTWEDVTLGQ I VATATLI IEHKF IHS CAKR 
GR VED WVSDECRG KQLG KLLLS TLTLLS KKLNC YK I T LE CLPQNVGF YKKFG YTVS E 
ENYMCRRFLK 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 10B. 



Table 10B. Comparison of NOVlOa against NOVlOb. 


Protein 
Sequence 


NOVlOa Residues/] Identities/ 
Match Residues j Similarities for the Matched Region 


NOVlOb 


1..184 I 163/184(88%) 
1..184 j 164/184(88%) 



Further analysis of the NOVlOa protein yielded the following properties shown in 
Table IOC. 



Table 10C. Protein Sequence Properties NOVlOa 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.1206 probability located in 
microbody (peroxisome); 0.1000 probability located in mitochondrial 
matrix space; 0. 1000 probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOVlOa protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 10D. 



Table 10D. Geneseq Results for NOVlOa 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOVlOa 
Residues/ 


Identities/ 
Similarities for 


Expect 
Value 
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Residues 


Region 




AAG67123 


Amino acid sequence of human 50287 
transferase - Homo sapiens, 184 aa. 
[WO200164904-A2, 07-SEP-2001] 


1..184 
1..184 


183/184 (99%) 
184/184 (99%) 


e-105 


AAB73505 


Human transferase HTFS-12, SEQ ID 
NO: 12 - Homo sapiens, 184 aa. 
[WO200132888-A2, 10-MAY-2001] 


1..184 
1..184 


183/184 (99%) 
184/184 (99%) 


e-105 


AAB63700 


Human gastric cancer associated 
antigen protein sequence SEQ ID 
NO: 1062 - Homo sapiens, 200 aa. 
[WO200073801-A2, 07-DEC-2000] 


1..184 
17..200 


183/184(99%) ! 
184/184(99%) ! 


e-105 


AAU07779 


Human novel transferase protein, NHP 
#22 - Homo sapiens, 184 aa. 
[WO200164903-A2, 07-SEP-2001] 


1..184 
1..184 


182/184(98%) ! 
183/184(98%) 


e-104 


AAM79992 


Human protein SEQ ID NO 3638 - 
Homo sapiens, 206 aa. 
[WO200157190-A2, 09-AUG-2001] 


1..184 
23..206 


181/184(98%) J 
183/184(99%) 


e-104 



In a BLAST search of public sequence databases, the NOV 10a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 10E. 
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Table 10E. Public BLASTP Results for NOVlOa 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOVlOa 
Residues/ 
Match 
Residues 


Identities/ 
Similarities 
for the 
Matched 
Portion 


Expect 
Value 


Q96EK6 


SIMILAR TO GLUCOS AMINE- 
PHOSPHATE N- 

ACETYLTRANSFERASE - Homo 
sapiens (Human), 184 aa. 


1..184 
1..184 


183/184 (99%) 
184/184(99%) 


e-104 


Q9JK38 


EMEG32 PROTEIN 
(GLUCOS AMINE-PHOSPHATE N- 
ACETYLTRANSFERASE) - Mus 
musculus (Mouse), 1 84 aa. 


1..184 
1..184 


180/184(97%) 
182/184(98%) 


e-102 


Q9VAI0 


Probable glucosamine-phosphate N- 
acetyltransferase (EC 2.3.1.4) 
(Phosphoglucosamine transacetylase) 
(Phosphoglucosamine acetylase) - 
Drosophila melanogaster (Fruit fly), 
219 aa. 


4.. 176 
6.. 179 


84/174(48%) 
123/174 (70%) 


2e-43 


Q17427 \ 


Probable glucosamine-phosphate N- 
acetyltransferase (EC 2.3.1.4) 
(Phosphoglucosamine transacetylase) 
(Phosphoglucosamine acetylase) - 
Caenorhabditis elegans, 165 aa. 


32.. 182 
15.. 165 


65/152 (42%) ! 
98/152(63%) | 


le-28 


045811 


T23G11.2 PROTEIN - 
Caenorhabditis elegans, 347 aa. 


42.. 184 
201. .340 


63/143 (44%) I 
88/143 (61%) 


3e-26 



PFam analysis predicts that the NOVlOa protein contains the domains shown in the 
Table 10F. 



Table 10F. Domain Analysis of NOVlOa 


Pfam Domain 


NOVlOa Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect 
Value 


Acetyltransf: domain 1 i 
of 1 


89.. 171 


22/87 (25%) 
62/87(71%) 


6.5e-13 



Example 11. 



The NOV1 1 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 1 1 A. 
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Table 1 1 A. NOV11 Sequence Analysis 




SEQ ID NO: 29 


709 bp 


NOVlla, 

CG58564-01 DNA Sequence 


CCCGCGGGCCAGCACCATGGAGGACGTGAAGCTGGAGTTCCCTTCCCTTCCACAGTGC 
AAGGAAGA03CCGAGGAGTGGACCTACCCTATGAGACGAGAGATGCAGGAAATTTTAC 
CTGGATTGTTCTTAGGCCCATATTCATCTGCTATGAAAAGCAAGCTACCTGTACTACA 
GAAACATGGAATAACCCATATAATATGCATACGACAAAATATTGAAGCAAACTTTATT 
AAACCAAACTTTCAGCAGTTATTTAGGTATTTAGTCCTGGATATTGCAGATAATCCAG 
TTGAAAATATAATACGTTTTTTCCCTATGACTAAGGAATTTATTGATGGGAGCTTACA 
AATGGG AGGTAAAGTTCTTGTG CATGG AAATG CAGGGATCTCC AGAAGTGCAGCCTTT 
GTT ATTG C AT ACATT ATGG AAACATTTGG AATG AAG TAC AGGG ATG CTTTTG CTT ATG 
TT CAAG AAAGAAG ATTTTGT ATT AATC CT AATG CTG G ATTTGT C CATCAACTT C AGG A 
ATATGAAGCCATCTACCTAGCAAAATTAACAATACAGATGATGTCACCACTCCAGATA 
GAAAGGT CATTATCTGTTCATTCTGGT AC CACAGGT AGTTTG AAG AG AACACATGAAG 
AAGAGGATGATTTTGGAACCATGCAAGTGGCGACTGCACAGAATGGCTOACTTGAAGA 
GCAACATCATAGA 




ORF Start: ATG at 17 


ORF Stop: TGA at 686 




SEQ ID NO: 30 


223 aa 


MW at 25492.2kD 


NOVlla, 

CG58564-01 Protein Sequence 


MED VKLE F PS LPQC KEDAE EWT Y PMRREMQE I L PGLF LG P Y S S AMKS KL P VLQ KHG I T 
H 1 1 CI RQN I EANF I KPNFQQLFRYLVLDI ADNPVENI I RFF PMTKEFI DGS LQMGGKV 
LVHGNAG I SRSAAFVI AY I METFGMKYRDAFAYVQERRFC I NPNAGF VHQLQE YEA I Y 
LAKLTIQMKS PLQI ERSLS VHSGTTGSLKRTHEEEDDFGTMQVATAQNG 




SEQ ID NO: 31 


724 bp 


NOVllb, 

CG58564-02 DNA Sequence 


ACTCTCCCACCCCACCCACCAGAATGGCGGGCCAGCACCATGGAGGACGTGAAGCTGG 


AGTTCCCTTCCCTTCC ACAGTGCAAGG AAGACG CCGAGGAGTGG ACCTACCCTATG AG 
ACG AGAGATGCAGGAAATTTT AT CTGG ATTGTT CTT AGGCCCATATTCATCTGCTATG 
AAAAGCAAGCTACCTGTACTACAGAAACATGGAATAACCCATATAATATGCATACGAC 
AAAATATT<SAAGCAAACTTTATTAAACCAAACTTTCAGCAGTTATTTAGATATTTAGT 
CCTGGATATTGCAGATAATCCAGTTGAAAATATAATACGTTTTTTCCCTATGACTAAG 
G AATTTAT TG ATGGG AG CTT ACAAATGGG AGG AAAAGTTCTTG T G CATG G AAATG CAG 
GGAT CTC CAGAAG TG CAG CCTTTGTT ATT G CAT AC ATT ATGG AAAC ATTTG G AATGAA 
GTACAGAGATGCTTTTGCTTATGTTCAAGAAAGAAGATTTTGTATTAATCCTAATGCT 
GG ATTTGT C CATCAA CTTCAGG AAT ATG AAG CCAT CT A C CT AG C AAAATT AACAAT AC 
AGATGATGTCACCACTCCAGATAGAAAGGTCATTATCTGTT CATT CTGGTACCAC AGG 
CAGTTTGAAGAGAACACATGAAGAAGAGGATGATTTTGGAACCATGCAAGTGGCGACT 
GCACAGAATGGCTGACTTGAAGAGCAAC 




ORF Start: ATG at 40 


ORF Stop: TGA at 709 




SEQ ID NO: 32 


223 aa 


MWat 25482. lkD 


NOVllb, 

CG58564-02 Protein Sequence 


MED VKLE F PS L PQCKED AEEWTYPMRREMQE I LSG LF LG PYS S AMKS KLPVLQKHG IT 
H 1 1 C I RQN I EANF I KPNFQQLFRYLVLD I ADN P VEN 1 1 R FF PMTKE F I DG S LQMGG KV 
LVHGNAG I S RS AAFVI AYIMETFGMKYRDAFAYVQERRFCI NPNAGFVHQLQEYEAI Y 
LAKLT I QMMS P LQ I ERS LS VHSGTTGS LKRTHEEEDD FGTMQVATAQNG 




SEQ ID NO: 33 


545 bp 


NOV 11c, 

CG58564-03 DNA Sequence 


ACTCTCCCACCCCACCCACCAGCCCGCGGGCCAGCACCATGGAGGACGTGAAGCTGGA 


GTTCCCTTCCCTTCCACAGTGCAAGGAAGACGCCGAGGAGTGGACCTACCCTATGAGA 
CGAG AG ATG CAGG AAATTTT A C CTGG ATTGTTCTTAGG C C C AT ATTC ATCTG CTATG A 
AAAGCAAGCTACCTGTACTACAGAAACATTTGGAATGAAGTACAGAGATGCTTTTGCT 
TATGTTCAAGAAAGAAGATTTTGT ATT AATC CTAATG CTGG ATTTGTCCAT CAACTTC 


AGGAATATGAAGCCATCTACCTAGCAAAATTAACAATACAGATGATGTCACCACTCCA 


GATAGAAAGGTCATTATCTGTTCATTCTGGTACCACAGGCAGTTTGAAGAGAACACAT 


G AGG AAG AGGATG ATTTTGG AAC CATG CAAGTGGCG ACTGCACAG AATGGCTG ACTTG 


AAGAGCAACATCATAGAGTGTGAATTTCTATTTGGGAAGGAGAAAATACAAGAGAAAA 


TT AT AATG T AAAATGGT AAAAAA 




ORF Start: ATG at 39 


ORF Stop: TGA at 210 




SEQ ID NO: 34 


57 aa 


MW at 6695.7kD 


NOV 11c, 

CG58564-03 Protein Sequence 


MEDVKLEFPSLPQCKEDAEEWTYPMRREMQEILPGLFLGPYSSAMKSKLPVLQKHLE 




SEQ ID NO: 35 


663 bp 


NOVlld, 

CG58564-04 DNA Sequence 


ACTCTCCCACCCCACCCACCAG CCCGCGGGCCAGCACCATGG AGG ACGTGAAG CTGG A 


GTTCCCTTCCCTTC CAC AGTG CAAGG AAG ACGCCGAGG AGTGGACCT ACCCTATG AG A 
CGAGAGATG CAGG AAATTTTAC CTGGATTGTTCTTAGGCCCAT ATTCATCTGCTATGA 
AAAGCAAGCTACCTGTACTACAGAAACATGGAATAACCCATATAATATGCATACGACA 
AAATATTGAAGCAAACTTTATTAAACCAAACTTTCAGCAGTTATTTAGACTAAGGAAT 
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TTATTGATGGGAGCTTACAAATGGGAGGAAAAGTTCTTGTGCATGGAAATGCAGGGAT 
CTCCAGAAGTGCAGCCTTTGTTATTGCATACATTATGGAAACATTTGGAATGAAGTAC 
AGAGATGCTTTTGCTTATGTTCAAGAAAGAAGATTTTGTATTAATCCTAATGCTGGAT 


TTGTCCATCAACTTCAGGAATATGAAGCCATCTACCTAGCAAAATTAACAATACAGAT 


G ATGTCACCACTCCAG AT AG AAAGGTCATT ATCTGTT CATT CTGGTAC CACAGG CAGT 


TTGAAGAGAACACATGAAGAAGAGGATGATTTTGGAACCATGCAAGTGGCGACTGCAC 


AGAATGGCTGACTTGAAGAGCAACT 




ORF Start: ATG at 39 


ORF Stop: TGA at 399 




SEQ ID NO: 36 


120 aa MWat 14245.6kD 


NOVlld, 

CG58564-04 Protein Sequence 


MEDVKLEFPSLPQCKEDAEEWTYPMRREMQE I LPGLFLGP YS SAMKSKLPVLQKHG I T 
HI I CIRQNIEANFIKPNFQQLFRLRNLLMGAYKWEEKFLCMEMQGSPEVQPLLLHTLW 
KHLE 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 1 IB. 



Table 11B. Comparison of NOVlla against NOVllb through NOVlld. 


Protein Sequence 


NOVlla Residues/ 
Match Residues \ 


Identities/ 
Similarities for the Matched Region 


NOVllb 


1..223 1 
1..223 


222/223 (99%) 
222/223 (99%) 


NOVllc 


1..55 
1.55 


55/55 (100%) 
55/55 (100%) 


NOVlld 


1..81 
1..81 


81/81 (100%) 
81/81 (100%) 



Further analysis of the NOV1 la protein yielded the following properties shown in 
Table 11C. 



Table 1 1C. Protein Sequence Properties NOV1 la 


PSort 
analysis: \ 


0.4698 probability located in microbody (peroxisome); 0.4500 probability 
located in cytoplasm; 0.1958 probability located in lysosome (lumen); 0.1000 
probability located in mitochondrial matrix space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV1 la protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 1 ID. 
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Table 1 ID. Geneseq Results for NOV1 la 


Geneseq 
Identifier 


Protein/Organism/Lengtb [Patent 
#, Date] 


NOVlla 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAU09017 


Human dual specificity phosphatase 
38692 - Homo sapiens, 223 aa. 
[WO200173059-A2, 04-OCT-2001] 


1..223 
1..223 


223/223 (100%) 
223/223 (100%) 


e-128 


AAE08552 


Human phosphatase protein - Homo 
sapiens, 223 aa. [WO200160992- 
A2, 23-AUG-2001] 


1..223 
1..223 


223/223 (100%) 
223/223 (100%) 


e-128 


AAM41520 


Human polypeptide SEQ ID NO 
6451 - Homo sapiens, 236 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..223 
14..236 


223/223 (100%) 
223/223 (100%) 


e-128 


AAM39734 


Human polypeptide SEQ ED NO 
2879 - Homo sapiens, 223 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..223 
1..223 


223/223 (100%) 
223/223 (100%) 


e-128 


AAU23521 


Novel human enzyme polypeptide 
#607 - Homo sapiens, 190 aa. 
[WO200155301-A2, 02-AUG-2001] 


25..171 
7..145 


55/147 (37%) 
80/147 (54%) 


le-18 



In a BLAST search of public sequence databases, the NOV1 la protein was found to 
have homology to the proteins shown in the BLASTP data in Table 1 IE. 



Table HE. Public BLASTP Results for NOVlla 


Protein 
Accession 
Number 


Protein/Organism/Lengtb 


NOVlla 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


CAD10219 


SEQUENCE 4 FROM PATENT 
WOO 173059 - Homo sapiens 
(Human), 223 aa. 


1..223 
1..223 


223/223 (100%) 
223/223 (100%) 


e-127 


Q9DCF8 


0610039A20RIK PROTEIN - Mus 
musculus (Mouse), 223 aa. 


1..223 
1..223 


215/223 (96%) 
221/223 (98%) 


e-124 


Q60970 


PROTEIN TYROSINE 
PHOSPHATASE-LIKE - Mus 
musculus (Mouse), 223 aa. 


1..223 
1..223 


214/223 (95%) 
221/223 (98%) 


e-124 


Q60969 


PROTEIN TYROSINE 
PHOSPHATASE-LIKE - Mus 
musculus (Mouse), 205 aa. 


1..168 
1..168 


163/168 (97%) 
167/168 (99%) 


2e-93 


Q99850 


TYROSINE PHOSPHATASE-LIKE 


116..181 
1..66 


66/66 (100%) 
66/66 (100%) 


3e-31 
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Homo sapiens (Human), 66 aa 
(fragment). 









PFam analysis predicts that the NOV1 la protein contains the domains shown in the 
Table 11F. 



Table 1 IF. Domain Analysis of NOV! 1 a 


Pfam Domain 


NOVlla Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect 
Value 


DSPc: domain 1 of 1 


28..173 


64/172 (37%) 
127/172 (74%) 


2.2e-63 


Y_phosphatase: domain 1 
ofl 


35..179 


. 35/279 (13%) 
93/279(33%) 


1.7 



Example 12. 

The NOV 12 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 12A. 



Table 12 A. NOV12 Sequence Analysis 




SEQ ID NO: 37 


3696 bp 


NOV12a, 

CG57819-01 DNA Sequence 


GTGTAAAAATACTGTCCATTTAATGTTTTCTGGGACTTTAGGTAAGAATATGAAAACT 
CAACCACCCTTGAGCAGGATGAACCGGGAGGAATTGGAGGACAGTTTCTTTCGACTTC 
GCGAAGATCAC^TGTTGGTGAAGGAGCTTTCTTGGAAGCAACAGGATGAGATCAAAAG 
GCTGAGGACCACCTTGCTG CGGTTGACCGCTGCTGGC CGGGACCTGCGGGTCGCGGAG 
GAGGCGGCGCCGCTCTCGGAGACCGCAAGGCGCGGGCAGAAGGCGGGATGGCGGCAGC 
GCCTCTCCATGCACCAGCGCCCCCAGATGCACCGACTGCAAGGGCATTTCCACTGCGT 
CGGCCCTCCCAGCCCCCGCCGCGCCCAGCCTCGCGTCCAAGTGGGACACAGACAGCTC 
CACACAGCCGGTGCACCGGTGCCGGAGAAACCCAAGAGGGGTAGGGACAGGCTGAGCT 
ACACAGCCC CTCCATCGTTTAAGGAGCATGCGACAAATGAAAACAGAGGTGAAGTAG C 
CAGTAAACC CAGTGAACTGGC CC ACATCATGGC CAGCAATACCATGCAAGTGGAAGAG 
C CACC CAAGTCTCCTGAGAAAATGTGGCCTAAAGATG AAAATTTTGAACAGAGAAGCT 
CATTGGAGTGTGCTCAGAAGGCTGCAGAGCTTCGGGCTTCCATTAAAGAGAAGGTAGA 
G CTGATTCGACTTAAGAAGCT CTTACATGAAAG AAATG CTTCATTGGTTATG ACAAAA 
GClAC^TTAACAGAAGTTCAAGAGGTCAGrrGCCATCTTTTGACCCAGAATCAGGGAA 
TCCTGAGTGCAGCCCATGAGGCCCTCCTCAAGCAAGTGAATGAGCTCAGGGCAGAGCT 
G AAGGAAG AAAGCAAG AAGGCTGTG AGCTTG AAGAG C C AACTGGAAG ATGTGTCT ATC 
TTGCAGATGACTCTGAAGGAGTTTCAGGAGAGAGTTGAAGATTTGGAAAAAGAACGAA 
AATTGCTGAATGACAATTATCACAAACTCTTAGAAAGCAGTGACAGCTCCAGTCAGCC 
CCACTGG AG CAACGAGCTCAT AGCGGAAC AGCT AC AGC AGCAAGTCTCTCAG CTG CAG 
GAT CAGCTGGATG CTG AGCTGGAGG ACAAG AGAAAAGTTTT ACTTGAGCTGTCCAGGG 
AGAAAGCCCAAAATGAGGATCTGAAGCTTGAAGTCACCAACATACTTCAGAAGCATAA 
ACAGG AAGT AGAG CTCCTC CAAAATGCAGCCACAATTTC CC AACCTCCTGACAGG CAA 
TCTGAACCAGCCACTCACCCAGCTGTATTGCAAGAGAACACTCAGATCCAGCCAAGTG 
AACCCAAAAACCAAGAAGAAAAGAAACTGTCCCAGGTGCTAAATGAGTTGCAAGTATC 
ACACGCAG AG ACCAC ATTGGAACTAGAAAAG ACCAGGGACATG CTTATTCTGCAGCG C 
AAAATCAACG TGTGTT ATC AGG AGG AA CTGG AGGCAATG ATG AC AAAAG CTG ACAATG 
ATAATAGAGATCACAAAGAAAAGCTGGAGAGGTTGACTCGACTACTAGACCTCAAGAA 
TAAC CGTATCAAGCAGCTGG AAG AACAG CTCAAAG ATGTTGCTTATGGCACCCGACCG 
TTGT CGTTATCTTTGGAAACACTGCCAG CCCATGG AGATGAGGAT AAAGTGGATATTT 
CTCTGCTG CATCAGGGTG AG AATCTTTTTG AACTG CACATCCACC AGGCCTTCCTGAC 
ATCTCCCGCCCTAGCTCAGGCTGGAGATACCCAACCTACCACTTTCTGCACCTATTCC 
TTCTATGACTTTCAAACCCACTCTACCCCATTATCTGTGGGGCCACAGCCCCTCTATG 
ACTTCACCTCCCAGTATGTGATCGAGACAGATTCGCTTTTCTTACACTACCTTCAAGA 
GGCTTCAGCCCGGCTTGACATACACCAGGCCATGGCCAGTCAACACAGCACTCTTGCT 
G CAGG ATGG ATTTGCTTTGACAGGGTG CTAG AGACTGTGG AG AAAGTCCATGG CTTGG 
CCACACTGATTGGTGCTGGTGGAGAAGAGTTCGGGGTTCTAGAGTACTGGATGAGGCT 
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GCGTTTCCCCATAAAACCCAGCCTACAGGCGTGCAATAAACGAAAGAAAGCCCAGGTC 
TACCTGTCAACCGATGTGCTTGGAGGCCGGAAGGCCCAGGAAGAGGAGGTGAGATCGG 
AGTCTTGGGAACCTCAG AACGAGCTGTGGATTG AAATCAC CAAGTGCTGTGGC CTCCG 
GAGTCGATGGCTGGGAACTCAACCCAGTCCATATGCTGTGTACCGCTTCTTCACCTTT 
TCTGACCATGACACTGCCATCATTCCAGCCAGTAACAACCCCTACTTTAGAGACCAGG 
CTCGATTCCCAGTGCTTGTGACCTCTGACCTGGACCATTATCTGAGACGGGAGGCCTt 
GTCTATACATGTTTTTGATGATGAAGACTTAGAGCCTGGCTCGTATCTTGGCCGAGCC 
CGAGTGCCTTTACTGCCTCTTGCAAAAAATGAATCTATCAAAGGTGATTTTAACCTCA 
CTGACCCTGCAGAGAAACCCAACGGATCTATTCAAGTGCAACTGGATTGGAAGTTTCC 
CTACATACCCCCTGAGAGCTTCCTGAAACCAGAAGCTCAGACTAAGGGGAAGGATACC 
AAGGACAGTTCAAAGATCTC ATCTGAAG AGG AAAAGG CTTCATTTCCTTC CCAGGATC 
AGATGGCATCTC CTGAGGTT CCCATTGAAGCTGG CCAGTATCGATCT AAGAG AAAACC 
TCCTCATGGGGGAGAAAGAAAGGAGAAGGAGCACCAGGTTGTGAGCTACTCAAGAAGA 
AAACATGGCAAAAGAATAGGTGTTCAAGGAAAGAATAGAATGGAGTATCTTAGCCTTA 
ACATCTTAAATGGAAATACACTGAAGCAGGTGAATTACACTGAGTGGAAGTTCTCAGA 
GACTAACAGCTTCATAGGTG ATGG CTTT AAAAAT CAG CACGAGGAAGAGG AAATGACA 
TTATCCCATTCAGCACTGAAACAGAAGGAACCTCTACATCCTGTAAATGACAAAGAAT 
CCTCTGAACAAGGTTCTGAAGTCAGTGAAGCACAAACTACCGACAGTGATGATGTCAT 
AGTGCCACCCATGTCTCAGAAATATC CTAAGGCAGATTCAGAG AAG ATGTG CATTGAA 
ATTGTCTCCCTGGCCTTCTACCCAGAGGCAGAAGTGATGTCTGATGAGAACATAAAAC 
AGGTGTATGTGGAGTACAAATTCTACGACCTACCCTTGTCGGAGACAGAGACTCCAGT 
GTCCCTAAGGAAGCCTAGGGCAGGAGAAGAAATCCACTTTCACTTTAGCAAGGTAATA 
GACCTGGACCCACAGGAGCAGCAAGGCCGAAGGCGGTTTCTGTTCGACATGCTGAATG 
GACAAGATCCTGATCAAGGACAGTTAAAGTTTACAGTGGTAAGTGATCCTCTGGATGA 
AG AAAAGAAAGAATGTGAAG AAGTGGGATATG CATAT CTTCAACTGTGGCAGATCCTG 
GAGTCAGG AAG AGATATTCT AG AG C AAG AG CT AG ACG TTG TT AG C C CTG AAG ATCTGG 
CTACCCCAATAGGAAGGCTGAAGGTTTCCCTTCAAGCAGCTGCTGTCCTCCATGCTAT 
TTACAAGGAGATGACTGAAGATTTGTTTTCATOAAGGAACAA 




ORF Start: ATG at 23 


ORF Stop: TGA at 3686 




SEQIDNO:38 


1221 aa MW at 139825.2kD 


NOV 12a, 

CG57819-01 Protein Sequence 


MFSGTIX3KNMKTQPPLSRMNREELEDSFFRLREDHMLVKELSWKQQDEIKRLRTTLLR 
LTAAGRDLRVAEEAAPLSETARRGQKAGWRQRLSMHQRPQMHRLQGHFHCVG PAS PRR 
AQPRVQVGHRQLHTAGAPVPEKPKRGRDRLSYTAPPSFKEHATNENRGEVASKPSELA 
HIMASNTMQVEEPPKSPEKMWPKDENFEQRSSLECAQKAAELRASIKEKVELIRLKKL 
LHERNASLVMTKAQLTEVQEVSCHLLTQNQGILSAAHEALLKQVNELRAELKEESKKA 
VSLKSQLEDVSILQMTLKEFQERVEDLEKERKLLNDNYDKLLESSDSSSQPHWSNELI 
AEQLQQQVSQLQDQLDAELEDKRKVLLEI^REKAQNEDLKLEVTNILQKHKQEVELLQ 
NAAT I SQPPDRQSEPATH PAVLQENTQIQP SE PKNQEEKKLSQVLNELQVSHAETTLE 
LE KTRDML I LQRK I NVCYQE E LEAMMTKADNDNRDHKE KL SRLTRLLD LKNNR I KQLE 
EQLKDVAYGTRPLSLCLETL PAHGDEDKVD I S LLHQGENLFE LH I HQAFLTS AALAQA 
GDTQPTTFCTYS FYDFETHCT P LS VGPQPLYDFTSQYVMETDSLFLHYLQEAS ARLDI 
HQAMAS EHSTLAAGWI CFDRVLETVEKVHGLATLIGAGGE EFGVLEYWMRLRFP I KPS 
LQ ACNKRKKAQVYLS TD VLGGRKAQE E EVRS E S WE PQNELW I E I TKCCGLRS RWLGTQ 
PSPYAVYRFFTFSDHDTAI I PASNNPYFRDQARFPVLVTSDLDHYLRREALS IHVFDD 
EDLE PGS YLGRARVPLLPLAKNES I KGDFNLTD PAEKPNGS I QVQLDWKF PYIPPESF 
LKPEAQTKGKDTKDSSKISSEEEKASFPSQDQMASPEVPIEAGQYRSKRKPPHGGERK 
EKEHQWS YSRRKHGKR IGVQGKNRME YLS LNI LNGNTLKQVNYTEWKFS ETNSF IGD 
GFKNQHEEEEMTLSHSALKQKEPLHPVNDKESSEQGSEVSEAQTTDSDDVIVPPMSQK 
YPKADSEKMCI E I VSLAFYPEAEVMSDENI KQVYVEYKFYDLPLSETETPVSLRKPRA 
GEEIHFHFSKVIDLDPQEQQGRRRFLFDMLNGQDPDQGQLKFTWSDPLDEEKKECEE 
VGYAYLQLWQILESGRDILEQELDWSPEDLATPIGRLKVSLQAAAVLHAIYKEMTED 
LFS 



Further analysis of the NOV12a protein yielded the following properties shown in 
Table 12B. 



Table 12B. Protein Sequence Properties NO VI 2a 


PSort 
analysis: 


0.9600 probability located in nucleus; 0.3000 probability located in 
microbody (peroxisome); 0.1000 probability located in mitochondrial matrix 
space; 0.1000 probability located in lysosome (lumen) 




No Known Signal Sequence Predicted 
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analysis: 



A search of the NOV 12a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 12C. 



Table 12C. Geneseq Results for NOV12a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV12a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Rppinn 


Expect 
Value 


AAM78558 


Human protein SEQ ID NO 1220 - 
Homo sapiens, 1 1 79 aa. 
[WO200157190-A2, 09-AUG-2001] 


63..1219 
47..1177 


400/1193(33%) 
640/1193 (53%) 


c-172 


AAM79542 


Human protein SEQ ID NO 3 1 88 - 
Homo sapiens, 1 160 aa. 
[WO200157190-A2, 09-AUG-2001] 


63..1219 
28..1158 


400/1193 (33%). 
640/1193 (53%) 


e-172 


AAM41414 


Human polypeptide SEQ ID NO 
6345 - Homo sapiens, 1 1 60 aa. 
[WO200153312-A1, 26-JUL-2001] 


63..1219 
28..1158 


400/1193 (33%) 
640/1193 (53%) 


e-172 


AAM39628 


Human polypeptide SEQ ID NO 
2773 - Homo sapiens, 1 128 aa. 
[WO200153312-A1, 26-JUL-2001] 


118..1219 
47.. 1126 


390/1138(34%) 
623/1138(54%) 


e-171 


AAG75661 


Human colon cancer antigen protein 
SEQ ID NO:6425 - Homo sapiens, 
1 18 aa. [WO200122920-A2, 05- 
APR-2001] 


445..523 
33..111 


40/79 (50%) 
56/79 (70%) 


le-13 



In a BLAST search of public sequence databases, the NOV12a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 12D. 
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Table 12D. Public BLASTP Results for NOV12a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV12a 
Residues/ 

Match 
Residues 


Identifies/ 

Similarities for the 
Matched Portion 


Expect 
Value 


Q96KN7 


RPGR-INTERACTING 
PROTEIN 1 - Homo sapiens 
(Human), 1286 aa. 


7..1221 
29..1286 


1203/1258 (95%) 
1207/1258 (95%) 


0.0 


Q96QA8 


RPGR-INTERACTING 
PROTEIN 1 - Homo sapiens 
(Human), 1286 aa. 


7..1221 
29..1286 


1203/1258 (95%) 
1207/1258 (95%) 


0.0 


Q9GLM3 


RPGR-INTERACTING 
PROTEIN-1 - Bos taurus 
(Bovine), 1221 aa. 


1..1221 
1..1221 


922/1234 (74%) 
1031/1234 (82%) 


0.0 


Q9NR40 


RPGR-INTERACTING 
PROTEIN - Homo sapiens 
(Human), 902 aa. 


331..1221 
1..902 


883/902 (97%) 
888/902 (97%) 


0.0 


Q9HBK6 


RPGR-INTERACTING 
PROTEIN-1 - Homo sapiens 
(Human), 762 aa. 


471..1221 
1..762 


742/763 (97%) 
746/763 (97%) 


0.0 



PFam analysis predicts that the NOV12a protein contains the domains shown in the 
Table 12E. 



Table 12E. Domain Analysis of NO VI 2a 


Pfam Domain 


NOV 12a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


PFEMP: domain 1 of 1 


293..413 


23/176 (13%) 
82/176 (47%) 


7.9 


C2: domain 1 of 1 


736..825 j 


14/101 (14%) 
54/101 (53%) 


1.4 



Example 13. 



The NOV 13 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 13 A. 
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Table 13A. NOV13 Sequence Analysis 




SEQ ID NO: 39 


678 bp 


NOV13a, 

CG57789-01 DNA Sequence 


TGGGGCGGGAGGCATGGTCTCCACCTACCGGGTGGCCGTGCTGGGGGCGCGAGGTGTG 
GGCAAGAGTGCCATCGTGCGCCAGTTCTTGTACAACGAGTTCAGCGAGGTCTGCGTCC 
CCACCACCGCCCGCCGCCTTTACCTGCCTGCTGTCGTCATGAACGGCCACGTGCACGA 
CCTCCAGATCCTCGACTTTCCACCCATCAGCGCCTTCCCTGTCAATACGCTCCAGGAG 
TGGGCAGACACCTGCTGCAGGGGACTCCGGAGTGTCCACGCCTACATCCTGGTCTACG 
ACATCTGCTGCTTTGACAGCTTTGAGTACGTCAAGACCATCCGCCAGCAGATCCTGGA 
G ACG AGGGTG AT CGGAACCTC AGAG ACG C CC ATCATCATCGTGGGCAACAAGCGGGAC 
CTGCAGCGCGGACGCGTGATCCCGCGCTGGAACGTGTCGCACCTGGTACGCAAGACCT 
GGAAGTGCGGCTACGTGGAATGCTCGGCCAAGTACAACTGGCACATCCTGCTGCTCTT 
CAGCGAGCTGCTCAAGAGCGTCGGCTGCGCCCGTTGCAAGCACGTGCACGCTGCCCTG 
CGCTTCCAGGGCGCGCTGCGCCGCAACCGCTGCGCCATCATGTGACGCCTGCGCGCCC 
CT CGGG CTG CACCGGCACTGGCCGAG CGGAGGGCGGGG CC 




ORF Start: ATGat 14 


ORF Stop: TGA at 623 




SEQ ID NO: 40 


203 aa 


MW at 23229.0kD 


NOV13a, 

CG57789-01 Protein Sequence 


MVSTYRVAVLGARGVGKSAIVRQFLYNEFSEVCVPTTARRLYLPAWMNGHVHDLQIL 
DFPPI SAFPVNTLQEMADTCCRGLRSVHAYI LVYD I CCFDS FE YVKTI RQQ I LETRVI 
GTSETPI I IVGNKRDLQRGRVIPRWNVSHLVRKTWKCGYVECSAKYNWHILLLFSELL 
KSVGCARCKHVHAALRFQGALRRNRCAIM 




SEQ ID NO: 41 


682 bp 


NOV 13b, 

CG57789-02 DNA Sequence 


TGGGAGGCATGGTCTCCACCTACCGGGTGGCCGTGCTGGGGGCGCGAGGTGTGGGCAA 
GAGTGCCATCGTGCGCCAGTTCTTGTACAACGAGTTCAGCGAGGTCTGCGTCCCCACC 
ACCG CC CGCCGCCTTTACCTGCCTGCTGTCGTCATGAACGGCCACGTGCACGAC CTCC 
AGATCCTCGACTTTCCACCCATCAGCGCCTTCCCTGTCAATACGCTCCAGGAGTGGGC 
AGACACCTGCTGCAGGGGACTCCGGAGTGTCCACXICCTACATCCTGGTCTACGACATC 
TGCTGCTTTGACAGCTTTGAGTACGTCAAGACCATCCGCCAGCAGATCCTGGAGACGA 
GGGTGATCGGAACCTCAGAGACGCCCATCATCATCGTGGGCAACAAGCGGGACCTGCA 
GCGCGGACGCGTGATCCCGCGCTGGAACGTGTCGCACCTGGTACGCAAGACCTGGAAG 
TGCGGCTACGTGGAATGCTCGGCCAAGTACAACTGGCACATCCTGCTGCTCTTCAGCG 
AGCTGCTCAAGAGCGTCGGCTGCGCCCGTTGCAAGCACGTGCACGCTGCCCTGCGCTT 
CCAGGGCGCGCTGCGCCGCAACCGCTGCGCCATCATGTGACGCCTGCGCGCCCCTCGG 
GCTGCACCGGCACTGGCCGAGCGGAGGGCACTGGCCGAGCGGAG 




ORF Start: ATG at 9 


ORF Stop: TGA at 618 




SEQ ED NO: 42 


203 aa 


MW at 23229.0kD 


NOV 13b, 

CG57789-02 Protein Sequence 


MVS TYR VAVLG ARGVGKS A I VRQFL YNE F S E VCV PTT ARRLYL P AWMNGHVHDLQ I L 
DFPPI SAFPVNTLQEWADTCCRGLRSVHAYI LVYDI CCFDSFEYVKTIRQQI LETRVI 
GTS ETP 1 1 1 VGNKRDLQRGRVI PRWNVSHLVRKTWKCGYVECS AKYNWH I LLLFSE LL 
KSVGCARCKHVHAALRFQGALRRNRCAIM 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 13B. 



Table 13B. Comparison of NOV13a against NOV13b. 


Protein 
Sequence 


NOV1 3a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV13b 


1..203 ; 
1..203 \ 


203/203 (100%) 
203/203(100%) 



Further analysis of the NOV 13a protein yielded the following properties shown in 
Table 13C. 
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Table 13C. Protein Sequence Properties NOV13a 


PSort analysis: 


0.6500 probability located in plasma membrane; 0.5064 probability located 
in mitochondrial matrix space; 0.3844 probability located in microbody 
(peroxisome); 0.2556 probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV13a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 13D. 



Table 13D. Geneseq Results for NOV13a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#,Date] 


NOV13a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB42840 


Human ORFX ORF2604 polypeptide 
sequence SEQ ID NO:5208 - Homo 
sapiens, 136 aa. [WO200058473-A2, 
05-OCT-2000] 


1..136 
1..136 


136/136(100%) 
136/136(100%) 


2e-75 


AAM41682 


Human polypeptide SEQ ID NO 6613 
- Homo sapiens, 206 aa. 
[WO200153312-A1, 26-JUL-2001] 


5.. 174 
15..173 


66/171 (38%) 
89/171 (51%) 


4e-18 


AAM39896 


Human polypeptide SEQ ID NO 3041 
- Homo sapiens, 199 aa. 
[WO200153312-A1, 26-JUL-2001] 


5..174 
8..166 


66/171 (38%) 
89/171 (51%) 


4e-18 


AAY99656 


Human GTPase associated protein-7 - 
Homo sapiens, 281 aa. 
[WO200031263-A2, 02-JUN-2000] 


5..173 
25..191 


59/179(32%) 
87/179(47%) 


3e-14 


AAR05075 


RAP1A Gene product incorporating 
at least one peptide associated with 
ras oncogene - Synthetic, 184 aa. 
[WO9000179-A, ll-JAN-1990] 


5.. 177 
4.. 165 


57/175 (32%) 
90/175 (50%) 


5e-14 



In a BLAST search of public sequence databases, the NOV 13a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 13E. 
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Table 13E. Public BLASTP Results for NOV13a 


Protein 
Accession 
Number 


Protein/Organism/Lengtb 


NOV13a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96S79 


RAS-LIKE PROTEIN/VTS58635 - 
Homo sapiens (Human), 203 aa. 


1..203 
1..203 


203/203 (100%) I 
203/203(100%) j 


e-118 


Q92737 


Ras-like protein RRP22 (RAS-related 
protein on chromosome 22) - Homo 
sapiens (Human), 203 aa. 


1..203 
1..203 


105/204(51%) 
134/204(65%) 


3e-50 


Q95KD9 


HYPOTHETICAL 22.5 KDA 
PROTEIN - Macaca fascicularis 
(Crab eating macaque) (Cynomolgus 
monkey), 199 aa. 


5..174 
8..166 


66/171 (38%) \ 
89/171 (51%) 


le-17 


Q96HU8 


SIMILAR TO CG8500 GENE 
PRODUCT - Homo sapiens 
(Human), 199 aa. 


5.. 174 
8.. 166 


66/171 (38%) 
89/171 (51%) 


le-17 


Q9NF75 


EG:BACR37P7.8 PROTEIN - 
Drosophila melanogaster (Fruit fly), 
306 aa. 


5.. 174 
48..210 


61/174(35%) j 
88/174(50%) \ 


4e-16 



PFam analysis predicts that the NOV 13a protein contains the domains shown in the 
Table 13F. 



Table 13F. Domain Analysis of NO VI 3 a 


Pfam Domain 


NO VI 3a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


Semialdhyde dh: domain 1 
of 1 


4..14 


4/11(36%) 
11/11 (100%) 


0.75 


ras: domain 1 of 1 


6..203 


56/224 (25%) 
125/224 (56%) 


1.2e-12 



Example 14. 



The NOV 14 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 14 A. 
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Table 14A. NOV14 Sequence Analysis 




SEQ ID NO: 43 


1790 bp 


N0V14a, 

CG57758-01 DNA Sequence 


TCTCCCTCCCGCGCGATOGCCTCGGCGCTGAGCTATGTCTCCAAGTTCAAGTCCTTCG 
TGATCTTGTTCGTCACCCCGCTCCTGCTGCTGCCACTCGTCATTCTGATGCCCGCCAA 
GGTCAGTTGTGC CT ACGTCATC ATCCTC ATGGCCATTT ACTGGTG CAC AG AAGTCATC 
CCTCTGGCTGTCACCTCTCTCATGCCTGTCTTGCTTTTCCCACTCTTCCAGATTCTGG 
ACTCCAGGCAGGTGTGTGTCCAGTACATGAAGGACACCAACATGCTGTTCCTGGGCGG 
CCTCATCGTGGCCGTGG CTGTGG AG CGCTGGAACCTGCACAAGAGGATCG CCCTGCGC 
ACGCTCCTCTGGGTGGGGGCCAAGCCTGCACGGCTGATGCTGGGCTTCATGGGCGTCA 
CAGCCCTCCTGTCCATGTGGATCAGTAACACGGCAACCACGGCCATGATGGTGCCCAT 
CGTGGAGGCCATATTGCAGCAGATGGAAGCCACAAGCGCAGCCACCGAGGCCGGCCTG 
GAG CTGGTGGACAAGGGCAAGG CCAAGG AGCTGCCAGGGAGTCAAGTGATTTTTG AAG 
G CC CC ACT CTGGGGC AGCAGG AAGACCAAG AGCGG AAGAGGTTGTGT AAGG CCATGAC 
CCTGTGCATCTGCTACGCGGCCAGCATCGGGGGCACCGCCACCCTGACCGGGACGGGA 
CCCAACGTGGTGCTCCTGGGCCAGATGAACGAGTTGTTTCCTGACAGCAAGGACCTCG 
TGAACTTTGCTTCCTGGTTTGCATTTGCCTTTCCCAACATGCTGGTGATGCTGCTGTT 
CGCCTGGCTGTGGCTCCAGTTTGTTTACATGTTCTCCAGTTTTAAAAAGTC CTGGGGC 
TGCGGGCTAG AG AGCAAG AAAAACGAG AAGGCTGCC CTCAAG GTGCTG CAGGAGGAGT 
ACCGGAAGCTGGGGCCCTTGTCCTTCGCGGAGATCAACGTGCTGATCTGCTTCTTCCT 
GCTGGTCATCCTXSTGGTTCTCCCGAGACCCCGGCTTCATGCCCGGCTGGCTGACTGTT 
GCCTGGGTGGAGGGTGAGACAAAGTATGTCTCCGATGCCACTGTGGCCATCTTTGTGG 
CCAC C CTGCT ATTCATTGTGCCTT C ACAG AAGCCCAAGTTTAACTTC CG CAGCCAGAC 
TGAGGAAGGTAAGTCTCCTGTTCTGATCGCCCCCCCTCCCCTGCTGGATTGGAAGGTA 
ACC CAGG AG AAAGTG CCCTGGGGCATCGTGCTG CTACTAGGGGGCGGATTTGCTCTGG 
CTAAAGGATCCGAGGCCTCGGGGCTGTCCGTGTGGATGGGGAAGCAGATGGAGCCCTT 
GCACGCAGTGCCCCCGGCAGCCATCACCrTGATCTTGTCCTTGCTCGTTGCCGTGTTC 
ACTGAGTGCACAAGCAACGTGGCCACCACCACCTTGTTCCTGCCCATCTTTGCCTCCA 
TGTCTCGCTCCATCGGCCTCAATCCGCTGTACATCATGCTGCCCTGTACCCTGAGTGC 
CTCCTTTGCCTTCATGTTGCCTGTGGCCACCCCTCCAAATGCCATCGTGTTCACCTAT 
GGGCACCTCAAGGTTGCTGACATGGTGAAAACAGGAGTCATAATGAACATAATTGGAG 
TCTTCTGTGTGTTTTTGGCTGTCAACACCTGGGGACGGGCCATATTTGACTTGGATCA 
TTTCCCTGACTGGGCTAATGTGACACATATTGAGACTTAGGAAGAGCCACAAGACCAC 
ACACACAGCCCTTACCCTCCTCAGGACTACCGAACCTTCTGGCACACCTT 




ORF Start: ATGat 16 


ORF Stop: TAG at 1720 




SEQ ID NO: 44 


568 aa MW at 62592.9kD 


NOV 14a, 

CG57758-01 Protein Sequence 


MASALSYVSKFKSFVILFVTPLLLLPLVILMPAKVSCAYVIILMAIYWCTEVIPLAVT 
SLMPVLLFPLFQILDSRQVCVQYMKDTNMLFLGGLIVAVAVERWNLHKRIALRTLLWV 
GAKPARLMLGFMGVTALLSMW I SNTATTAMMVPI VEAI LQQMEATS AATEAGLELVDK 
GKAKELPGSQVI FEGPTLGQQEDQERKRLCKAMTLCI CYAAS IGGTATLTGTGPNWL 
LGQMNELFPDSKDLVNFASWFAFAFPNMLVMLLFAWLWLQFVYMFSSFKKSWGCGLES 
KKNEKAALKVLQE E YRKLG PLS FAE INVL I C F FLL V I LWFS RD PG FM PGWLTVAWVEG 
ETKYVSDATVAIFVATLLFIVPSQKPKFNFRSQTEEGKSPVLIAPPPLLDWKVTQEKV 
PWGIVLLLGGGFALAKGSEASGLSVWMGKQMEPLHAVPPAAITLILSLLVAVFTECTS 
NVATTTLFLPI FASMSRS IGLNPLYI MLPCTLSAS FAFMLPVAT PPNAI VFTYGHLKV 
ADMVKTGV I MN I IGVFCVFLAVNTWGRAI FDLDHFPDWANVTHI ET 




SEQ ID NO: 45 


1899 bp 


NOV14b, 

CG57758-02 DNA Sequence 


CGTCTCGCCCGCCAGTCTCCCTCCCGCGCGATGGCCTCGGCGCTGAGCTATGTCTCCA 
AGTTCAAGTCCTTCGTGATCTTGTTCGTCACCCCGCTCCTGCTGCTGCCACTCGTCAT 
TCTGATGCCCX3CCAAGGTCAGTTGCTGTGCCTACGTCATCATCCTCATGGCCATTTAC 
TGGTGCACAGAAGTCATCCCTCTGGCTGTCACCTCTCTCATGCCTGTCTTGCTTTTCC 
CACTCTTCCAGATTCTGGACTCCAGGCAGGTGTGTGTCCAGTACATGAAGGACACCAA 
CATGCTGTTCCTGGGCGGCCTCATCGTGGCCGTGGCTGTGGAGCGCTGGAACCTGCAC 
AAGAGGATCGCCCTGCGCACGCTCCTCTGGGTGGGGGCCAAGCCTGCACGGCTGATGC 
TGGGCTTCATGGGCGTCACAGCCCTCCTGTCCATGTGGATCAGTAACACGGCAACCAC 
GGCC ATGATGGTG CCCATCGTGGAGGC CATATTGCAG CAGATGG AAGCCACAAGCGCA 
GCCACCGAGGCCGGCCTGGAGGGACAAGGTACCACAATAAACAACCTGAATGCACTGG 
AGGATGATACAGTGAAAGCAGTACTAGGAGGAAAGTGTGTAGCTATAATAAGCACTTA 
CGT CAAAAAAGTAGAAAAACTTCAAAT AAACAATCTAATG ACACCTCTTAAAAAACT A 
G AAAAG CAAGAGCAACAGGACCTAGGGCCTGG CATCAGGCCTC AGGACT CTGCCCAGT 
GCCAGGAAGACCAAG AGCGG AAG AGGTTGTGTAAGGCCATGACCCTGTG CATCTG CT A 
CGCGGCCAGCATCGGGGGCACCGCCACCCTGACCGGGACGGGACCCAACGTGGTGCTC 
CTGGGCCAGATGAACGAGTTGTTTCCTGACAGCAAGGACCTCGTGAACTTTGCTTCCT 
GGTTTGCATTTGCCTTTCCCAACATGCTGGTGATGCTGCTGTTCGCCTGGCTGTGGCT 
CCAGTTTGTTTACATGTTCTCCAGTTTTAAAAAGTCCTGGGGCTGCGGGCTAGAGAGC 
AAGAAAAACGAGAAGGCTGCCCTCAAGGTGCTGCAGGAGGAGTACCGGAAGCTGGGGC 
CCTTGTCCTTCGCGGAGATCAACGTGCTGATCTGCTTCTTCCTGCTGGTCATCCTGTG 
GT^CTCCCGAGACCCCGGCTTCATGCCCGGCTGGCTCACTGTTGCCTGGGTGGAGGGT 
GAGACAAAGTCAGTCTCCGATGCCACTGTGG CCATCTTTGTGG CCACCCTG CTATTCA 
TTGTGCCTTCACAGAAGCCCAAGTTTAACTTCCGCAGCCAGACTGAGGAAGGTAAGTC 
TCCTGTTCTGATCGCCCCCCCTCCCCTGCTGGATTGGAAGGTAACCCAGGAGAAAGTG 
CCCTGGGGCATCGTGCTGCTACTAGGGGG CGGATTTGCTCTGG CTAAAGGATC CGAGG 
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CCTCGGGGCTGTCCGTGTGGATGGGGAAGCAGATGGAGCCCTTGCACGCAGTGCCCCC 
GGCAGCCATCACCTTGATCTTGTCCTTGCTCGTTGCCGTGTTCACTGAGTGCACAAGC 
AACGTGGCCACCACCACCTTGTTCCTGCCCATCTTTGCCTCCATGTCTCGCTCCATCG 
GCCTCAATCCGCTGTACATCATGCTGCCCTGTACCCTGAGTGCCTCCTTTGCCTTCAT 
GTTGCCTGTGGCCACCCCTCCAAATGCCATCGTGTTCACCTATGGGCACCTCAAGGTT 
GCTGACATGGTAAAAACAGGAGTCATAATGAACATAATTGGAGTCTTCTGTGTGTTTT 
TGGCTGTCAACACCTGGGGACGGGCCATATTTGACTTGGATCATTTCCCTGACTGGGC 
TAATGTGACACATATTGAGACTTAGGAAGAGCCACAAGACCAC 




ORF Start: ATG at 3 1 


ORF Stop: TAG at 1879 




SEQIDNO:46 


616 aa 


MWat67816.9kD 


NOV 14b, 

CG57758-02 Protein Sequence 

* 


MAS ALS YVSKFKS FVI LFVTPLLLLPLVI LMPAKVSCCAYVI I LMAI YWCTE VI PLAV 
TSLMP VLLFPLFQI LDSRQVCVQYMKDTNMLFLGGLI VAVAVERWNLHKRI ALRTLLW 
VGAKPARLMLGFMGVTALLSMWI SNTATTAMMVPI VEAI LQQMEATSAATEAGLEGQG 
TTINNl^ALEDDTVKAVXXSGKCVAIISTYVKKVEKI^INNIJ^TPLKKiEKQEQQDLGP 
G I R PQDS AQCQ EDQE RKRLC KAMTLC I CYAAS IGGT AT LTGTG PNWLLGQMNELF PD 
SKDLVNFAS WFAFAF PNMLVMLLFAWLWLQFVYMFS S FKKSWGCGLESKKNEKAALKV 
LQEEYRKLGPLS FAE INVLI C FFLLVI LWFSRDPGFMPGWLTVAWVEGETKS VSDATV 
AIFVATLLFIVPSQKPKFNFRSQTEEGKSPVLIAPPPLLDWKVTQEKVPWGIVLLLGG 
G F ALAKG S E ASG LS VWMGKQME P LHAV P P AA ITL I LS L LVA VFT ECTSNVATTTLF L P 
I FAS MSRSIGLNPLYI MLPCTLS ASFAFML P VAT P PNA I VFT YGHL KVADMVKTGV I M 
NI I GVFCVFLAVNTWRAI FDLDHFPDWANVTH I ET 




SEQ ID NO: 47 


1899 bp 


NOV14c, 

CG57758-03 DNA Sequence 


CGTCTCGCCCGCCAGTCTCCCTCCCGCGCGATGGCCTCGGCGCTGAGCTATGTCTCCA 
AGTTCAAGTCCTTCGTGATCTTGTTCGTCACCCCGCTCCTGCTGCTGCCACTCGTCAT 
TCTGATGCCCGCCAAGGTCAGTTGCTGTGCCTACGTCATCATCCTCATGGCCATTTAC 
TGGTG CAC AG AAG T CAT C C CT CTGG CTG T CACCT CTCT CATGCCTGT CTTG CTTTT C C 
CACTCTTCCAG ATT CTGGACT CCAGGCAGGTGTGTGTC CAGTACATGAAGGACACCAA 
CATGCTGTTCCTGGGCGGCCTCATCGTGGCCGTGGCTGTGGAGCGCTGGAACCTGCAC 
AAGAGGATCGCCCTGCGCACGCTCCTCTGGGTGGGGGCCAAGCCTGCACGGCTGATGC 
TGGGCTTCATGGGCGTCACAGCCCTCCTGTCCATGTGGATCAGTAACACGGCAACCAC 
GGCCATGATGGTGCCCATCGTGGAGGCCATATTGCAGCAGATGGAAGCCACAAGCGCA 
GCCACCGAGGCCGGCCTGGAGGGACAAGGTACCACAATAAACAACCTCAATCCACTGG 
AGGATGATACAGTGAAAGCAGTACTAGGAGGAAAGTGTGTAGCTATAATAAGCACTTA 
CGTCAAAAAAGTAGAAAAACTTCAAATAAACAATCTAATGACACCTCTTAAAAAACTA 
GAAAAGCAAGAGCAACAGGACCTAGGGCCTGGCATCAGGCCTCAGGACTCTGCCCAGT 
G CCAGGAAG ACCAAG AGCGGAAGAGGTTGTGT AAGG CCATG ACCCTGTG CATCTG CTA 
CGCGGCCAGCATCGGGGGCACCG CCACCCTG ACCGGGACGGGACCCAACGTGGTG CTC 
CTGGG CCAGATG AACGAGTTCTTTCCTGACAGCAAGGACCrCGTGAACTTTG CTTCCT 
GGTTTGCATTTGCCTTTCCCAACATGCTGGTGATGCrrGCTCTTCGCCTGGCTGTGGCT 
CCAGTTTGTTTACATGTTCTCCAGTTTTAAAAAGTCCTGGGGCTGCGGGCTAGAGAGC 
AAGAAAAACGAGAAGGCTGCCCTCAAGGTGCTGCAGGAGGAGTACCGGAAGCTGGGGC 
CCTTGTCCTTCGCGGAGATCAACGTGCTGATCTGCTTCTTCCTGCTGGTCATCCTGTG 
GTTCTCCCGAGACCCCGGCTTCATGCCCGGCTGGCTGACTGTTGCCTGGGTGGAGGGT 
GAGACAAAGTCAGTCTCCGATGCCACTGTGGCC^TCTTTGTGGCCACCCTGCTATTCA 
TTGTGC CTTCACAGAAGCCCAAGTTTAACTTCCGCAG CCAG ACTGAGG AAGGT AAGTC 
TCCTGTTCTGATCGCCCCCCCTCCCCTGCTGGATTGGAAGGTAACCCAGGAGAAAGTG 
CCCTGGGGCATCGTG CTGCTACTAGGGGG CGG ATTTG CTCTGG CT AAAGG ATCCGAGG 
C CT CGGGG CTG T C CGTGTGGATG GGGAAG CAGATGG AGC C CTTG C ACG C AG TG CC C CC 
GGCAGCCATCACCTTGATCTTGTCCTTGCTCGTTGCCGTGTTCACTGAGTGCACAAGC 
AACGTGGCCACCACCACCTTGTTCCTGCCCATCTTTGCCTCCATGTCTCGCTCCATCG 
GCCTCAATCCGCTGTACATCATGCTGCCCTGTACCCTGAGTGCCTCCTTTGCCTTCAT 
GTTGCCTGTGGCCACCCCTCCAAATGCCATCCTGTTCACCTATGGGCACCTCAAGGTT 
G CTGACATGGT AAAAACAGG AGT CAT AATG AACAT AATTGG AGTCTTCTGTGTGTTTT 
TGGCTGTCAACACCTGGGGACGGGCCATATTTGACTTGGATCATTTCCCTGACTGGGC 
TAATGTGACACATATTGAGACTTAGGAAGAGCCACAAGACCAC 




ORF Start: ATG at 31 


ORF Stop: TAG at 1879 




SEQ ID NO: 48 


616 aa 


MWat 67816.9kD 


NOV14c, 

CG57758-03 Protein Sequence 


MAS ALS YVSKFKS FVI LFVTPLLLLPLVI LMPAKVSCCAYVI I LMAI YWCTE VI PLAV 
TSLMPVLLF PLFQI LDSRQVCVQYMKDTNMLFLGGLI VAVAVERWNLHKRI ALRTLLW 
VGAK PAR LMIjGFMGVTALLSMW I SNTATTAMMVPI VEAI LQQMEATSAATEAGLEGQG 
TTI NNLNALEDDTVKAVLGGKCVAI I STYVKKVEKLQ INNLMT PLKKLE KQEQQDLG P 
GIRPQDSAQCQEDQERKRLCKAMTLCI CYAAS IGGTATLTGTGPNVVLLGQMNELFPD 
SKDLWFASWFAFAFPNMLVMLLFAWLWLQFVYMFSSFKKSWGCGLESKKNEKAALKV 
IiQEEYRKIXSPLSFAEINVLICFFLLVI LWFSRDPGFMPGWLTVAWVEGETKS VSDATV 
AIFVATLLFIVPSQKPKFNFRSQTEEGKSPVLIAPPPLLDWKVTQEKVPWGIVLLLGG 
GFALAKGSEASGLSVWMGKQMEPLHAVPPAAITLILSLLVAVFTECTSNVATTTLFLP 
IFASMSRSIGLNPLYIMLPCTLSASFAFMLPVATPPNAIVFTYGHLKVADMVKTGVIM 
Nil GVFCVFLAVNTWGRAI FDLDHFPDWANVTH I ET 




SEQ ID NO: 49 


1606 bp 
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NOVHd, 

CG57758-04 DNA Sequence 


GATGGCCTCGGCGCTGAGCTATGTCTCCAAGTTCAAGTCCTTCGTGATCTTGTTCGTC 
ACCCCGCTCCTGCTGCTGCCACTCGTCATTCTGATGCCCGCCAAGTTTGTCAGGTGTG 
CCTACGTCATCATCCTCATGGCCATTTACTGGTGCACAGAAGTCATCCCTCTGGCTGT 
CACCTCTCTCATGCCTGTCTTGCTTTTCCCACTCTTCCAGATTCTGGACTCCAGGCAG 
GTGTGTGTCCAGTACATGAAGGACACCAACATGCTGTTCCTGGGCGGCCTCATCGTGG 
CCGTGGCTGTGGAGCGCTGGAACCTGCACAAGAGGATCGCCCTGCGCACGCTCCTCTG 
GGTGGGGGCCAAGCCTGCACGGCTGATGCTGGGCTTCATGGGCGTCACAGCCCTCCTG 
TCCATGTGGATCAGTAACACGGCAACCACGGCCATGATGGTGCCCATCGTGGAGGCCA 
TATTGCAGCAGATGGAAGCCACAAGCGCAGCCACCGAGGCCGGCCTGGAGCTGGTGGA 
CAAGGGCAAGGCCAAGGAGCTG CCAGGGAGTCAAGTG ATTTTTGAAGGCC CCACTCTG 
GGG C AG CAGG AAG AC C AAG AGCGG AAG AGG TTG TG T AAGG C C ATG AC C CTG TG CAT CT 
GCTACGCGGCCAGCATCGGGGGCACCGCCACCCTGACCGGGACGGGACCCAACGTGGT 
GCTCCTGGGCCAGATGAACGAGTTGTTTCCTGACAGCAAGGACCTCGTGAACTTTGCT 
TCCTGGTTTGCATTTGCCTTTCCCAACATGCTGGTGATGCTGCTGTTCGCCTGGCTGT 
GGCTCCAGTTrGTTTACATG AGATTCAATTTTAAAAAGTC CTGGGG CTGCGGG CTAGA 
GAGCAAGAAAAACGAGAAGGCTGCCCTCAAGGTGCTGCAGGAGGAGTACCGGAAGTTG 
GGGCCCTTGTCCTTCGCGGAGATCAACGTGCTGATCTGCTTCTTCCTGCTGGTCATCC 
TGTGGTTCTCCCGAGACCCCGGCTTCATGCCCGGCTGGCTGACTGTTGCCTGGGTGGA 
GGGTGAGACAAAGTATGTCTCCGATGCCACTGTGGCCATCTTTGTGGCCACCCTGCTA 
TTCATTGTGCCTTCACAGAAGCCCAAGTTTAACTTCCGCAGCCAGACTGAGGAAGAAA 
GGAAAACTCCATTTTATCCCCCTCCCCTGCTGGATTGGAAGGTAACCCAGGAGAAAGT 
GCCCTGGGGCATCGTG CTGCTACTAGGGGG CGGATTTGCTCTGG CTAAAGGATCCG AG 
GCCTCX3GGGCTGTCCGTGTGGATGGGGAAGCAGATGGAGCCCTTGCACGCAGTGCCCC 
CGGCAGCCATCACCTTGATCTTGTCCTTGCTCGTTGCCGTGTTCACTGAGTGCACAAG 
CAACGTGGCCACCACCACCTTGTTCCTGCCCATCTTTGCCTCCATGGTGAAAACAGGA 
GTCATAATCAACATAATTGGAGTCTTCTGTGTGTTTTTGGCTGTCAACACCTGGGGAC 
GGGCCATATTTGACTTGGATCATTTCCCTGACTGGGCTAATGTGACACATATTGAGAC 
TT AGGAAG AG CCACAAG ACCAC AC AC AT AGCCCTTACCCT 




ORF Start: ATG at 2 |ORF Stop: TAG at 1568 




SEQ ID NO: 50 |522 aa |MW at 58109.6kD 


NOV14d, 

CG57758-04 Protein Sequence 


MASAL.SYVSKFKSFVILFVTPLLLLPLVILMPAKFVRCAYVI ILMAIYWCTEVI PLAV 
TSLMPVLLFPLFQILDSRQVCVQYMKDTNMLFbGGLIVAVAVERWNLHKRIALRTLLW 
VG AK PARLMLG F MG VT AL LS MW I S NT ATT AMMVP I VEA I LQQ MEAT S AAT EAG L E LVD 
KGKAKELPGSQVIFEGPTLGQQEDQERKRLCKAMTLCICYAASIGGTATLTGTGPNW 
LLGQMNELFPDSKDLVNFASWFAFAFPNMLVMLLFAWLWLQFVYMRFNFKKSWGCGLE 
SKKNEKAALKVLQEEYRKLGPLSFAEIbn^ICFFl^VILWFSRDPGFMPGWI,TVAWVE 
GETKYVSDATVAIFVATLLFIVPSQKPKFNFRSQTEEERKTPFYPPPLLDWKVTQEKV 
PWGIVLLLGGGFALAKGSEASGLSVWMGKQMEPLHAVPPAAITLILSLLVAVFTECTS 
NVATTTLFLPI FASMVKTG VI MNI IGVFCVFLAVNTWGRAI FDLDH F PDW ANVTH I ET 




SEQ ID NO: 51 


1781 bp 


NOV14e, 

CG57758-05 DNA Sequence 


GATGGCCTCGGCGCTGAGCTATGTCTCCAAGTTCAAGTCCTTCGTGATCTTGTTCGTC 
ACCCCGCTCCTGCTGCTGCCACTCGTCATTCTGATGCCCGCCAAGTTTGTCAGGTGTG 
CCTACGTC^TCATCCTCATGGCCATTTACTGGTGCACAGAAGTCATCCCTCTGGCTGT 
CACCTCTCTCATGCCTGTCTTGCTTTTCCCACTCTTCCAGATTCTGGACTCCAGGCAG 
GTGTGTGTCCAGTACATGAAGGACACCAACATGCTGTTCCTGGGCGGCCTCATCGTGG 
CCGTGGCTGTGGAGCGCTGGAACCTGCACAAGAGGATCGCCCTGCGCACGCTCCTCTG 
GGTGGGGGCCAAGCCTGCACGGCTGATGCTCGGCTTCATGGGCGTCACAGCCCTCCTG 
TCCATGTGGATCAGTAACACGGCAACCACGGCCATGATGGTGCCCATCGTGGAGGCCA 
T ATTG CAGCAG ATGG AAGCC ACAAG CG CAGCCACCGAGGCCGGCCTGGAGCTGGTGGA 
CAAGGGCAAGGCCAAGGAGCTGCCAGGGAGTCAAGTGATTTTTGAAGGCCCCACTCTG 
GGGCAGCAGGAAGACCAAGAGCGGAAGAGGTTGTGTAAGGCCATGACCCTGTGCATCT 
GCTACGCGGCCAGCATCGGGGGCACCGCCACCCTGACCGGGACGGGACCCAACGTGGT 
GCTCCTGGGCCAGATGAACGAGTTGTTTCCTGACAGCAAGGACCTCGTGAACTTTGCT 
TCCTGGTTTGCATTTGCCTTTCCCAACATGCTGGTGATGCTGCTGTTCGCCTGGCTGT 
GGCTCCAGTTTGTTTACATGAGATTCAATTTTAAAAAGTCCTGGGGCTGCGGGCTAGA 
GAG CAAGAAAAACG AGAAGGCTG CC CT C AAGGTGCTG C AGGAGGAGTACCGG AAGTTG 
GGGCCCTTGTCCTTCGCGGAGATCAACGTGCTGATCTGCTTCTTCCTGCTGGTCATCC 
TGTGGTTCTCCCGAGACCCCGGCTTCATGCCCGGCTGGCTGACTGTTGCCTGGGTGGA 
GGGTGAGACAAAGTATGTCTCCGATGCCACTGTGGCCATCTTTGTGGCCACCCTGCTA 
TTCATTGTG CCTTCACAG AAGCCCAAGTTTAACTTCCG CAGCCAGACTG AGGAAGAAA 
GG AAAACTCCATTTTATC CCCCTC CCCTGCTGGATTGGAAGGTAAC CCAGG AGAAAGT 
GCCCTGGGGCATCGTGCTGCTACTAGGGGGCGGATTTGCTCTGGCTAAAGGATCCGAG 
GCCTCGGGGCTGTCCGTCTGGATGGGGAAGCAGATGGAGCCCTTGCACGCAGTGCCCC 
CGGCAGCCATCACCTTGATCTTGTCCTTGCTCGTTGCCGTGTTCACTGAGTGCACAAG 
CAACGTGGCCACCACCACCTTGTTCCTGCCCATCTTTGCCTCCATGAATCACGTCCCC 
AAGAGCTTCTGTCTTCTGTACGGTGATGTTGCAGTGCTGTCTTTCCGCAGTCTCGCTC 
CATCGGCCTCAATCCGCTGTACATCATGCTGCCCTGTACCCTGAGTGCCTCCTTTGCC 
TTCATGTTGCCTGTGGCCACCCCTCCAAATGCCATCGTGTTCACCTATGGGCACCTCA 


AGG TTG CTG ACATGGTG AAAACAGG AG TCAT AATG AAC AT AATTGGAG T CTTCTGTG T 


GTTTTTGGCTGTCAACACCTGGGGACGGGCCATATTTGACTTGGATCATTTCCCTGA 


TGGGCTAATGTGACACATATTGAGACTTAGGAAG AGC CACA 




ORF Start: ATG at 2 


ORF Stop: TGAat 1550 
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SEQIDNO:52 


516 aa 


MWat57173.5kD 


NOV14e, 

CG57758-05 Protein Sequence 


MASALSYVSKFKSFVILFVTPLLLLPLVILMPAKFVRCAYVI ILMAIYWCTEVI PLAV 
TSLMPVLLFPLFQILDSRQVCVQYMKDTNMLPLGGLIVAVAVERWNLHKRIALRTLLW 
VG AK PARLMLG FMG VTALLS MW I SNT ATTAMMVP I VEA I LQQMEATS AATEAG LE LVD 
KGKAKELPGSQVI FEGPTLGQQEDQERKRLCKAMTLCICYAAS IGGTATLTGTGPNW 
LIX3QMNELFPDSKDLVNFASWFAFAFPNMLVMLLFAWLWLQFVYMRFNFKKSWGCGLE 
SKKNEKAALKVLQEEYRKLG PLS FAE INVLI CFFLLVI LWFSRDPGFMPGWLTVAWVE 
GETKYVSDATVAIFVATLLFIVPSQKPKFNFRSQTEEERKTPFYPPPLLDWKVTQEKV 
PWGI VLLLGGG FALAKGSEASGLS VWMGKQME PLHAVPPAAI TLILSLLVAVFTECTS 
NVATTTLFLPI FASMNHVPKSFCVLYGDVAVLSFRSLAPSAS IRCTSCCPVP 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 14B. 



Table 14B. Comparison of NOV14a against NOV14b through NOV14e. 


Protein Sequence 


NOV14a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV14b 


1..568 
1..616 


519/616 (84%) 
524/616 (84%) 


NOVHc 


1..568 
I ..616 


519/616 (84%) 
524/616 (84%) 


NOV14d 


1..568 
1..522 


483/570 (84%) 
485/570 (84%) 


NOV14e 


1..480 
1..480 


440/482 (91%) 
443/482 (91%) 



Further analysis of the NOV 14a protein yielded the following properties shown in 
Table 14C. 



Table 14C. Protein Sequence Properties NOV14a 


PSort analysis: 


0.6400 probability located in plasma membrane; 0.4600 probability located 
in Golgi body; 0.3700 probability located in endoplasmic reticulum 
(membrane); 0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Likely cleavage site between residues 38 and 39 



A search of the NOV 14a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 14D. 
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Table 14D. Geneseq Results for NOV! 4a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV14a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB23625 


Human secreted protein SEQ ID NO: 
50 - Homo sapiens, 627 aa. 
[WO200049134-A1, 24-AUG-2000] 


10..566 
9..623 


256/623(41%) j 
386/623(61%) i 


e-137 


AAB36158 


Novel human transporter protein SEQ 
ID NO: 2 - Homo sapiens, 627 aa. 
[WO200065055-A2, 02-NOV-2000] 


10..566 
9..623 


256/623(41%) j 
386/623(61%) j 


e-137 


AAB42213 


Human ORFX ORF1977 polypeptide 
sequence SEQ ID NO:3954 - Homo 
sapiens, 627 aa. [WO200058473-A2, 
05-OCT-2000] 


10..566 
9..623 


256/623(41%) \ 
386/623(61%) | 


e-136 


AAB36164 


Novel human transporter protein SEQ 
ID NO: 14 - Homo sapiens, 626 aa. 
[WO200065055-A2, 02-NOV-2000] 


10..566 
9..622 


252/623(40%) j 
382/623 (60%) 


e-136 


AAB36159 


Novel human transporter protein SEQ 
ID NO: 4 - Homo sapiens, 627 aa. 
[WO200065055-A2, 02-NOV-2000] 


10..566 
9..623 


256/623(41%) I 
385/623(61%) ; 


e-136 



In a BLAST search of public sequence databases, the NOV14a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 14E. 



Table 14E. Public BLASTP Results for NOV14a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV14a ! 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


057661 


INTESTINAL SODIUM/LITHIUM- 
DEPENDENT DICARBOXYLATE 
TRANSPORTER 
(NA(+)/DICARBOXYLATE 
COTRANSPORTER) - Xenopus laevis 
(African clawed frog), 622 aa. 


1..564 
1..619 


336/619 (54%) 
444/619(71%) 


0.0 


Q9ES88 


NA/DICARBOXYLATE 
COTRANSPORTER (SOLUTE 
CARRIER FAMILY 13 (SODKJM- 
DEPENDENT DICARBOXYLATE 
TRANSPORTER), MEMBER 2) - Mus 
musculus (Mouse), 586 aa. 


1..561 
1..567 


311/572 (54%) 
421/572 (73%) 


e-179 


035055 








e-179 
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cotransporter! 1 
(na(+)/dic arb ox ylate 
cotransporter 1) (kidney 

DICARBOXYLATE TRANSPORTER) 
(SDCT1) (ORGANIC ANION 
TRANSPORTER 1) (OAT1) - Rattus 
norvegicus (Rat), 587 aa. 


1..568 


419/572(72%) 




Q13183 


Renal sodium/dicarboxylate cotransporter. . 
(Na(+)/dicarboxylate cotransporter) - 
Homo sapiens (Human), 592 aa. 


1..561 
1..572 


318/581(54%) 
428/581 (72%) 


e-179 


Q28615 


Renal sodium/dicarboxylate cotransporter 
(Na(+)/dicarboxylate cotransporter) - 
Oryctolagus cuniculus (Rabbit), 593 aa. 


1..562 
1..576 


300/586(51%) 
418/586(71%) 


e-172 



PFam analysis predicts that the NOV14a protein contains the domains shown in the 
Table 14F. 



Table 14F* Domain Analysis of NO VI 4a 


Pfam Domain 


NOV14a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


Na sulph symp: domain 
lofl 


6..S54 


163/604 (27%) 
424/604 (70%) 


8.3e-140 



Example 15. 

The NOV 15 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 15 A. 



Table 15A. NOV15 Sequence Analysis 




SEQIDNO:53 1547 bp 


NOV15a, 

CG57732-01 DNA Sequence 


AACCCCCTTGACTGAAGCAATOGAGGGGGGTCCAGCTGTCTGCTGCCAGGATCCTCGG 
GCAGAGCTGGTAGAACGGGTGGCAGCCATCGATGTGACTCACTTGGAGGAGGCAGATG 
GTGGCCCAGAGCCTACTAGAAACGGTGTGGACCCCCCACCACGGGCCAGAGCTGCCTC 
TGTGATCCCTGG CAGTACTTCAAG ACTGCTCCCAGC CCGG CCT AGCCTCTCAG CCAGG 
AAGCTTTCCCTACAGGAGCGGCCAGCAGGAAGCTATCTGGAGGCGCAGGCTGGGCCTT 
ATGCCACGGGGCCTGCCAGCCACATCTCCCCCCGGGCCTGGCGGAGGCCCACCATCGA 
GTCCC ACC ACGTGG CCATCTCAG ATGCAG AGG ACTGCGTG CAGCTGAACCAGTACAAG 
CTG CAG AGTGAG ATTGG C AAGGG TG C CT A CGG TG TGGTG AGG CTGGC CT AC AACG AAA 
G TG AAG ACAG ACA CT ATG C AATG AAAG TC CTTTCCAAAAAG AAGTTACTG AAG C AG T A 
TGGCTTTCCACGTCGCCCTCCCCCGAGAGGGTCCCAGGCTGCCCAGGGAGGACCAGCC 
AAGCAGCTGCTGCCCCTGGAGCGGGTGTACCAGGAGATTGCCATCCTGAAGAAGCTGG 
ACCACGTGAATGTK^TCAAACTGATCGAGGTACTGGATGACCCAGCTGAGGACAACCT 
CTATTTGCCCCGCATCCTTCTCCATAGGCCCGTCATGGAAGTGCCCTGTGACAAGCCC 
TTCTCGGAGGAGCAAGCTCGCCTCTACCTGCGGGACGTCATCCTGGGCCTCGAGTACG 
TGCACTGCCAGAAG ATCGT CCACAGGG AC ATCAAGCCAT CCAACCTGCTCCTGGGGGA 
TGATGGGCACGTGAAGATCGCOGACTTTGGCGTCAGCAACCAGTTTGAGGGGAACGAC 
GCTCAG CTGTCCAGCACGG CGGG AACCCCAG CATTCATGGCCC CCGAGG CCATTTCTG 
ATTCCGGCCAGAGCTTCAGTGGGAAGTTGGATGTATGGGCCACTGGCGTCACGTTGTA 
CTGCTTTGTCT ATGGG AAGTGCCCATTCAT CGACGATTTCATC CTGGCCCT CCACAGG 
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AAGATCAAGAATGAGCCCGTGGTGTTTCCTGAGGAGCCAGAAATCAGCGAGGAGCTCA 
AGGACCTGATCCTGAAGATGTTAGACAAGAATCCCGAGACGAGAATTGGGGTGCCAGA 
CATCAAGTTGCACCCTTGGGTGACCAAGAACGGGGAGGAGCCCCTTCCTTCGGAGGAG 
GAGC ACTGCAGCGTGGTGGAGGTG ACAGAGGAGG AGGTTAAG AACTCAGTCAGG CT CA 
TCCCCAGCTGGACCACGGTCATCCTGGTGAAGTCCATGCTGAGGAAGCGTTCCTTTGG 
GAACCCGTTTG AGC CCCAAG CACGG AGGGAAGAG CGAT CCATGT CTGCTCCAGGAAAC 
CT AGTGGTGAAAGAAGGGTTTGGTG AAGGGGGCAAGAGCCCAGAGCTC C CCGGCGT CC 
AGGAAGACGAGGCTGCATCCTGAGCCCCTGCATGCACCC 




ORF Start: ATG at 20|ORF Stop: TGA at 1529 




SEQ ID NO: 54 |503 aa |MW at 55606.7kD 


NOV 15a, 

CG57732-01 Protein Sequence 


MEGG PAVCCQDPRAELVERVAAIDVTHLEEADGG PE PTRNGVDPPPRARAAS VI PGST 
SRLLPARPSLSARKLSLQERPAGSYLEAQAGPYATGPASHISPRAWRRPTIESHHVAI 
SDAEDCVQLNQYKIiQSEIGKGAYGV\nUiAYNESEDRHYAMKVLSKKKLLKQYGFPRRP 
PPRGSQAAQGGPAKQLLPLERVYQEIAILKKLDHVNWKLIEVLDDPAEDNLYLPRIL 
LHRPVMEVPCDKPFSEEQARLYLRDVILGLEYVHCQKIVHRDIKPSNliLLGDDGHVKI 
ADFGVSNQFEGNDAQLSSTAGT PAFMAPEAI SDSGQS FSGKLDVWATGVTLYCFVYGK 
CPFIDDFILALHRKIKNEPWFPEEPEISEELKDLILKMLDKNPETRIGVPDIKLHPW 
VTKNGEEPLPSEEEHCSVVEVTEEEVKNSVRLIPSWTTVILVKSMLRKRSFGNPFEPQ 
ARREERSMSAPGNLLVKEGFGEGGKS PELPGVQEDEAAS 




SEQ ID NO: 55 


1611 bp 


NOV 15b, 

CG57732-02 DNA Sequence 


GCGCCCAGGTTCCCAACAAGGCTACGCAGAAGAACCCCCTTGACTGAAGCAATGGAGG 


GGGGTCCAGCTGTCTGCTGCCAGGATCCTCGGGCAGAGCTGGTAGAACGGGTGGCAGC 
CATCG ATGTGACTCACTTGGAGG AGG CAGATGGTGGCCCAG AG CCTACT AGAAACGGT 
GTGGACCCCCCACCACGGGCCAGAGCTGCCTCTGTGATCCCTGGCAGTACTTCAAGAC 
TGCTCCCAGC CCGGCCTAGCCTCT CAGCCAGG AAGCTTTCCCTACAGGAGCGGCCAG C 
AGG AAGCTAT CTGGAGGCGCAGGCTGGG CCTTATGCCACGGGGCCTGCCAG C CACATC 
TCCCCCCGGGCCTGGCGGAGGCCCACCATCGAGTCCCACCACGTGGCCATCTCAGATG 
CAGAGGACTGCGTGCAGCTGAACCAGTACAAGCTGCAGAGTGAGATTGGCAAGGGTGC 
CTACGGTGTGGTGAGGCTGGCCTACAACGAAAGTGAAGACAGACACTATGCAATGAAA 
GTCCTTTCCAAAAAGAAGTTACTGAAGCAGTATGGCTTTCCACGTCGCCCTCCCCCGA 
G AGGG TCCCAGG CTG CCCAGGG AGG AC CAG C CAAG CAGCTG CTG CCCCTGG AG CGGGT 
GT AC CAGG AGATTGCCATCCTG AAGAAGCTGGACCACGTGAATGTGGT CAAACTG ATC 
GAGGTCCTGGATGACCCAGCTGAGGACAACCTCTATTTGGTGTTTGACCTCCTGAGAA 
AGGGGCCCGTCATGGAAGTGCCCTGTGACAAGTCCTTCTCGGAGGAGCAAGCTCGCCT 
CTACCTGCGGGACGTCATCCTGGGCCTCGAGTACTTGCACTGCCAGAAGATCGTCCAC 
AGGGACATCAAG CCATCCAAC CTG CTCCTGGGGGATG ATGGG CACGTGAAG ATCGCCG 
ACTTTGGCGTCAGCAACCAGTTTGAGGGGAACGACGCTCAGCTGTCCAGCACGGCGGG 
AACCCCAGCATTCATGGCCCCCGAGGCCATTTCTGATTCCGGCCAGAGCTTCAGTGGG 
AAGGCCTTGGATGTATGGGCCACTGGCGTCACGCTGTACTGCTTTGTCTATGGGAAGT 
GCCCGTTCATCGACGATTTCATCCTGGCCCTCCACAGGAAGATCAAGAATGAGCCCGT 
GGTGTTTCCTG AGGGGCCAGAAATCAGCGAGGAG CT CAAGGACCTGATCCTGAAG ATG 
TTAG ACAAGAATCCCG AGACGAGAATTGGGGTG C CAG ACATCAAGTTGCACCCTTGGG 
TGACCAAGAACGGGGAGGAGCCCCTTCCTTCGGAGGAGGAGCACTGCAGCGTGGTGGA 
GGTG ACAGAGGAGG AGGTTAAG AACT CAGTCAGG CTCATCCCCAGCTGGACCACGGTG 
ATCCTGGTGAAGTCCATGCTGAGGAAGCGTTCCTTTGGGAACCCGTTTGAGCCCCAAG 
CACGGAGGGAAGAGCGATCCATGTCTGCTCCAGGAAACCTACTGGTGAAAGAAGGGTT 
TGGTGAAGGGGGCAAG AGCC CAG AG CTCCCCGG CGTCCAGGAAG ACGAGG CTGCATCC 
TGAGCCCCTGCATGCACCCAGGGCCACCCGGCAGCACACTCATCC 




ORF Start: ATG at 52 


ORF Stop: TGA at 1567 




SEQ ID NO: 56 


505 aa MW at 55652.7kD 


NOV15b, 

CG57732-02 Protein Sequence 


MEGGPAVCCQDPRAELVERVAAIDVTHLEEADGGPEPTRNGVDPPPRARAASVIPGST 
SRLLPARPSLSARKLSLQERPAGS YLEAQAGP YATG PASHI S PRAWRRPT I ESHHVAI 
SDAEDCVQLNQYKLQSEIGKGAYGWRLAYNESEDRHYAMKVLSKKKLLKQYGFPRRP 
PPRGSQAAQGGPAKQLLPLERVYQEIAILKKLDHVNVVKLIEVLDDPAEDNLYLVFDL 
LRKGPVMEVPCDKSFSEEQARLYLRDVILGLEYLHCQKIVHRDIKPSNLLLGDDGHVK 
IADFGVSNQFEGNDAQLSSTAGTPAFMAPEAISDSGQSFSGKALDVWATGVTLYCFVY 
GKCPFIDDFILALHRKIKNEPWFPEGPEISEELKDLILKMLDKNPETRIGVPDIKLH 
PWVTKNGEEPLPSEEEHCSVVEVTEEEVKNSVRLIPSWTTVILVKSMLRKRSFGNPFE 
PQARREERSMSAPGNLLVKEG FG EGG KS PELPGVQEDEAAS 




SEQ ID NO: 57 


1725 bp 


NOV 15c, 

CG57732-03 DNA Sequence 


GCG CCCAGGTTCCCAAC AAGGCT ACG CAGAAGAACCCCCTTGACTG AAGT AATGGAGG 


GGGGTCCAGCTGTCTGCTGCCAGGATCCTCGGGCAGAGCTGGTAGAACGGGTGGCAGC 
CATCG ATGTGACTCACTTGGAGG AGG CAG ATGGTGG CC CAGAGCCT ACTAGAAACGGT 
GTGGACCCCCCACCACGGGCCAGAGCTGCCTCTGTGATCCCTGGCAGTACTTCAAGAC 
TGCTCCCAGCCCGGCCTAGCCTCTCAGCCAGGAAGCTTTCCCTACAGGAGCGGCCAGC 
AGG AAG CTATCTGGAGGCGCAGGCTGGGCCTTATG CCACGGGGCCTGCCAGCCACATC 
TCCCCCCGGGCCTGGCGGAGGCCCACCATCGAGTCCCACCACGTGGCCATCTCAGATG 
CAGAGGACTGCGTGCAGCTGAACCAGTACAAGCTGCAGAGTGAGATTGGCAAGGGTGC 
CTACGGTGTGGTGAGGCTGGCCTACAACGAAAGTGAAGACAGACACTATGCAATGAAA 



139 



WO 02/072757 



PCT/US02/06908 





GTCCTTTCCAAAAAGAAGTTACTGAAGCAGTATGGCTTTCCACGTCGCCCTCCCCCGA 
GAGGGTCCCAGGCTGCCCAGGGAGGACCAGCCAAGCAGCTGCTGCCCCTGGAGCGGGT 
GTACCAGGAGATTGCCATCCTGAAGAAGCTGGACCACGTGAATGTGGTCAAACTGATC 
GAGGTCCTGGATGACCCGGCTGAGGACAACCTCTATTTGGCCCTGCAGAACCAGGCCC 
AGAATATCCAGTTAGATTCAACAAATATCGCCAAGTCCCACTCCCTGCTTCCCTCTGA 
GCAG CAAGACAGTGG AT CCACGTGGGCTGCGCGCTCAGTGTTTGACCTCCTGAG AAAG 
GGGCCCGTCATGGAAGTGCCCTGTGACAAGCCCTTCTCGGAGGAGCAAGCTCGCCTCT 
AC CTGCGGG ACGT CATCCTGGGCCT CG AGTACTTG CACTG CCAGAAGAT CGTCCACAG 
GGACATCAAGCCATCCAACCTGCTCCTGGGGGATGATGGGCACGTGAAGATCGCCGAC 
TTTGGCGTCAGCAACCAGTTTGAGGGGAACGACGCTCAGCTGTCCAGCACGGCGGGAA 
CCCCAGCATTCATGGCCCCCGAGGCCATTTCTGATTCCGGCCAGAGCTTCAGTGGGAA 
GGCCTTGGATGTATGGGCCACTGGCGTCACGTTGTACTGCTTTGTCTATGGGAAGTGC 
CCGTTCATCGACGATTTCATCCTGGCCCT CC ACAGG AAGACCAAGAATGAG C CCGTGG 
TGTTTCCTGAGGGGCCAGAAATCAGCGAGGAGCTCAAGGACCTGATCCTGAAGATGTT 
AGACAAGAATCCCGAGACGAGAATTGGGGTGCCAGACATCAAGTTGCACCCTTGGGTG 
ACCAAGAACGGGGAGGAGCCCCTTCCTTCGGAGGAGGAGCACTGCAGCGTGGTGGAGG 
TGACAGAGGAGGAGGTTAAGAACTCAGTCAGGCTCATCCCCAGCTGGACCACGGTGAT 
CCTGGTGAAGTCCATGCTGAGGAAGCGTTCCTTTGGGAACCCGTTTGAGCCCCAAGCA 
CGGAGGGAAGAGCGATCCATGTCTGCTCCAGGAAACCTACTGGTGAAAGAAGGGTTTG 
GTGAAGGGGGCAAGAGCCCAGAGCTCCCCGGCGTCCAGGAAGACGAGGCTGCATCCTG 
AGCCCCTGCATGCACCCAGGGCCACCCGGCAGCACACTCATCC 




ORF Start: ATG at 52 


ORF Stop: TGA at 1681 




SEQIDNO: 58 


543 aa 


MW at 59729.0kD 


NOV15c, 

CG57732-03 Protein Sequence 


MEGG PAVCCQD PRAELVERVAAIDVTHLE EADGGPE PTRNGVD PPPRARAAS VI PGST 
SRLLPARPS LSARKLSLQERPAGS YLEAQAGPYATGPASH I S PRAWRRPTI ESHHVAI 
SDAEDCVQLNQYKliQSEIGKGAYGVVRIAYNESEDRHYAMKVLSKiCKLLKQYGFPRRP 
PPRGSQAAQGGPAKQLLPLERVYQEIAI LKKLDHVNWKLI E VLDD PAEDNLYLALQN 
QAQNIQLDSTNIAKSHSLLPSEQQDSGSTWAARSVFDLLRKGPVMEVPCDKPFSEEQA 
RLYLRDVILGLEYLHCQKIVHRDIKPSNLLLGDDGHVKIADFGVSNQFEGNDAQLSST 
AGTPAFMAPEAI SDSGQS FSGKALDVWATGVTL YCFVYGKCPF I DDFI LALHRKTKNE 
PWFPEGPEISEELKDLILKMLDKNPETRIGVPDIKLHPWVTKNGEEPLPSEEEHCSV 
VEVTEEEVKNSVRLIPSWTTVILVKSMLRKRSFGNPFEPQARREERSMSAPGNLLVKE 
GFGEGGKS PEL PGVQEDEAAS 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 15B. 



Table 15B. Comparison of NOV15a against NOV15b through NOV15c. 


Protein Sequence 


NOVlSa Residues/ 
Match Residues j 


Identities/ 
Similarities for the Matched Region 


NOV15b 


1..503 
1..505 


495/505 (98%) 
497/505 (98%) 


NOV 15c 


1..503 
1..543 


492/543 (90%) 
495/543 (90%) 



Further analysis of the NOV 15a protein yielded the following properties shown in 
Table 15C. 



Table 15C. Protein Sequence Properties NOVlSa 


PSort 
analysis: 


0.7600 probability located in nucleus; 0.3000 probability located in 
microbody (peroxisome); 0.1000 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOVlSa protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several . 
homologous proteins shown in Table 15D. 



Table 15D. Geneseq Results for NOVlSa 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOVlSa 
Residues/ 
' Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAU03510 


Human protein kinase #10 - Homo 
sapiens, 513 aa. [WO200138503-A2, 
31-MAY-2001] 


1..503 
1..513 


496/513 (96%) 
498/513 f96%^ 


0.0 


AAE04361 


Human kinase (PKIN)-2 - Homo 
sapiens, 513 aa. [WO200146397-A2, 
28-JUN-2001] 


1..503 \ 
1..513 j 


496/513(96%) 
498/513 (96%) 


0.0 


AAY44239 


Human cell signalling protein-2 - 
Homo sapiens, 540 aa. 
[W09958558-A2, 18-NOV-1999] 


64..500 j 
90..538 


289/450 (64%) 
367/450 (81%) 


e-165 


AAM40450 


Human polypeptide SEQ ID NO 
5381 - Homo sapiens, 680 aa. 
[WO200153312-A1, 26-JUL-2001] 


64..482 j 
128..558 ; 


283/432 (65%) 
356/432 (81%) 


e-162 


AAM40449 


Human polypeptide SEQ ID NO 
5380 - Homo sapiens, 680 aa. 
[WO200153312-A1, 26-JUL-2001] 


64..482 j 
128..558 j 


283/432 (65%) 
356/432 (81%) 


e-162 



In a BLAST search of public sequence databases, the NOV 15a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 15E. 



Table 15E. Public BLASTP Results for NOVlSa 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOVlSa 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9BQH3 


HYPOTHETICAL 55.7 KDA 
PROTEIN - Homo sapiens (Human), 
505 aa. 


1..503 
1..505 


497/505 (98%) 
499/505 (98%) 


0.0 


P97756 








0.0 
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PROTEIN KINASE IV KINASE 
ISOFORM - Rattus norvegicus (Rat), 
505 aa. 


1..505 


478/505 (94%) 




AAH17529 


SIMILAR TO 

CALCIUM/CALMODULIN- 
DEPENDENT PROTEIN KINASE 
KINASE 1, ALPHA - Mus musculus 
(Mouse), 505 aa. 


1..503 
1..505 


464/505 (91%) 
478/505 (93%) 


0.0 


Q64572 


CA2+/CALMODULIN-DEPENDENT 
PROTEIN KINASE KINASE (EC 
2.7.1.37) - Rattus norvegicus (Rat), 
505 aa. 


1..503 
1..505 


463/505(91%) 
476/505 (93%) 


0.0 




/"> a T OTT TA A A T TV Jf/*YT\T TT TXT 

CALCIUM/CALMODULIN 
DEPENDENT PROTEIN KINASE 
KINASE ALPHA - Mus musculus 
(Mouse), 505 aa. 


1..503 
1..505 i 


454/505 (89%) 
471/505 (92%) 


0.0 



PFam analysis predicts that the NOV1 5 a protein contains the domains shown in the 
Table 15F. 



Table 15F. Domain Analysis of NOV15a 


Pfam Domain 


NOVlSa Match Region ; 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


Pkinase: domain 1 of 2 


128..228 


28/101 (28%) 
81/101 (80%) 


8.4e-16 


Pkinase: domain 2 of 2 


245..407 


70/201 (35%) 
129/201 (64%) 


1.7e-52 



Example 16. 

The NOV 16 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 16 A. 



Table 16A. NOV16 Sequence Analysis 




SEQ ID NO: 59 688 bp 


NOV16a, 

CG57709-01 DNA Sequence 


GACGCGGACC CG CCATGGCGCGG AAG AAGGTG CGTCCGCGGCTG ATCGCGGAG CTGGC 
CCGCCGCGTGCGCGCCCTGCGGGAGCAACTGAACAGGCCGCGCGACTCCCAGCTCTAC 
GCGGTGGACTACGAGACCTTGACGCGGCCGTTCTCTGGACGCCGGCTGCCGGTCCGGG 
CCTGGGCCGACGTGCGCCGCGAGAGCCGCCTCTTGCAGCTGCTCGGCCGCCTCCCGCT 
CTTCGGCCTGGGCCGCCTGGTCACGCGCAAGTCCTGGCTGTGGCAGCACGACGAGCCG 
TGCTACTGGCX3CCTCACGCGGGTGCGGCCCGACTACACGGCGCAGAACTTGGACCACG 
GGAAGGCCTGGGGC ATCCTGAC CTTCAAAGGTAAGG CTCGGG AG AGCGCG CGGG AG AT 
CG AACACGTCATGT ACCATGACTGG CGGCTGGTGCC CAAG CACG AGG AGG AGG CCTTC 
ACCGCGTTCACGCCGGCGCCGGAAGACAGCCTGGCCTCCGTGCCGTACCCGCCTCTCC 
TCCGGGCCATGATTATCGCAGAACGACAGAAAAATGGAGACACAAGCACCGAGGAGCC 
CATGCTGAATGTGCAGAGGATACGCATGGAACCCTGGGATTACCCTGCAAAACAGGAA 
GACAAAGG AAGGG CCAAGGGCACCCCCGTCT AGAATG CCAGAACC AGCGG 
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ORF Start: ATG at 1 5 


ORF Stop: TAG at 669 




SEQIDNO: 60 


218 aa MW at 25647.2kD 


NOV16a, 

CG57709-01 Protein Sequence 


MARKKVRPRLIAELARRVRAIJtEQLNRPRDSQLYAVDYETLTRPFSGRRLPVRAWADV 
RRESRLLQLIX3RLPLFGLGRLVTRKSWLWQHDEPCYWRLTRVRPDYTAQNLDHGKAKG 
ILTFKGKARESAREIEHVMYHDWRLVPKHEEEAFTAFTPAPEDSLASVPYPPLUiAMI 
IAERQKNGDTSTEEPMLNVQRIRMEPWDYPAKQEDKGRAKGTPV 



Further analysis of the NOV 16a protein yielded the following properties shown in 
Table 16B. 



Table 16B. Protein Sequence Properties NO VI 6a 


PSort 
analysis: 


0.9081 probability located in mitochondrial matrix space; 0.6000 probability 
located in mitochondrial inner membrane; 0.6000 probability located in 
mitochondrial intermembrane space; 0.6000 probability located in mitochondrial 
outer membrane 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV 16a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 16C. 



Table 16C. Geneseq Results for NO VI 6a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV16a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG81356 


Human AFP protein sequence SEQ 
ID NO:230 - Homo sapiens, 218 aa. 
[WO200129221-A2, 26-APR-2001] 


1..218 
1..218 


212/218(97%) 
212/218(97%) 


e-125 


AAU30525 


Novel human secreted protein 
#1016 - Homo sapiens, 85 aa. 
[WO200179449-A2, 25-OCT-2001] 


135..218 
1..84 


84/84 (100%) 
84/84 (100%) 


3e-45 


AAU30526 


Novel human secreted protein 
#1017 - Homo sapiens, 62 aa. 
[WO200179449-A2, 25-OCT-2001] 


187..217 
12..42 


31/31 (100%) | 
31/31(100%) 1 


4e-12 



In a BLAST search of public sequence databases, the NOV 16a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 16D. 



Table 16D. Public BLASTP Results for NOV16a 
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Protein 
Accession 
Number 


Protein/Organism/Length 


NO VI 6a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9BVI7 


HYPOTHETICAL 25.7 KDA 
PROTEIN - Homo sapiens (Human), ; 
218 aa. 


1..218 
1..218 


214/218(98%) 
214/218(98%) 


e-125 


P82930 


MITOCHONDRIAL 28S i 
RJBOSOMAL PROTEIN S34 
(MRP-S34) - Homo sapiens 
(Human), 218 aa. 


1.218 
1..218 


213/218(97%) 
213/218(97%) \ 


e-124 


CAC38606 


SEQUENCE 229 FROM PATENT | 
WO012922 1 - Homo sapiens 
(Human), 218 aa. \ 


1..218 
1..218 


212/218(97%) 
212/218(97%) j 


e-124 


Q9JDC9 


TCE2 (061 0007F04RIK PROTEIN) ! 
- Mus museums (Mouse), 218 aa. 


1..218 
1..218 


194/218(88%) 
205/218(93%) : 


e-114 


Q9D957 


0610007F04RDC PROTEIN - Mus 
musculus (Mouse), 218 aa. 


1..218 
1..218 


193/218(88%) 
205/218(93%) 


e-114 



PFam analysis predicts that the NOV 16a protein contains the domains shown in the 
Table 16E. 



Table 16E. Domain Analysis of NO VI 6a 



Pfam Domain 



NOV16a Match Region 



Identities/ 
Similarities 
for the Matched Region 



Expect Value 



No Significant Matches Found 



Example 17. 

The NOV 17 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 17 A. 



Table 17A. NO VI 7 Sequence Analysis 



SEQ ID NO: 61 



894 bp 



NOV17a, 

CG57700-01 DNA Sequence 



CTCCGTGACCATQAAGGTCAAGGTCATCCCCGTGCTCGAGGACAACTACATGTACCTG 



GTCATCGAGG AGCTCACGCGCGAGG CGGTGGCCGTGGACGTGGCTGTG CCC AAGAGG C 
TGCTGGAG ATCGTGGGCCGGG AGGGGGTGTCTCTGACCGCTGTG CTGACCACC CACCA 
TCACTGGGACCACGCGCGGGGAAACCCGGAGCTGGCGCGGCTTCGTCCCGGGCTGGCG 
GTGCTGGGCGCGGACGAGCGCATCTTCTCGCTGACGCGCAGGCTGGCGCACGGCGAGG 
AGCTGCAGTTCGGGGCCATCCACGTGCGTTGCCTCCTGACGCCCGGCCACACCGCCGG 
CCACATGAGCTACTTCCTGTGGGAGGACGATTGCCCGGACCCACCCGCCCTGTTCTCG 
GGTGGCGACGCGCTGTCGGTGGCCGGCTGCGGCTCGTGCCTGGAGGGCAGCGCCCAGC 
AGATGTACCAGAGCCTGGCCGAGCTGGGTACCCTGCCCCCCGAGACGAAGGTGTTCTG 
CGGCCACGAGCACACGCTTAGCAACCTGGAGTTTGCCCAGAAAGTGGAGCCCTGCAAC 
GACCACGTGAGAGCCAAGCTGTCCTGGGCTCAGAAGAGGGATGAGGATGACGTGCCCA 
CTGTGCCGTCGACTCTGGGCGAGGAGCGCCTCTACAACCCCTTCCTGCGGGTGGCGGA 
GGAGCCGGTGCGCAAGTTCACGGGCAAGG CGGTCCCCGCCGACGTCCTGGAGGCG CTA 
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TG CAAGGAG CGGG CG CG CTTCGAACAGGCGGGCG AGCCGCGG CAGCCACAGG CGCGGG 
CCCTCCTTGCGCTGCAGTGGGGGCTCCTGAGTGCAGCCCCACACGACTGAGCCACCCA 
GACCCTCACAGGGCTGGGGCCTGC 




ORF Start: ATGat 11 


ORF Stop: TGA at 860 




SEQ ID NO: 62 


283 aa 


MWat31262.3kD 


NOV17a, 

CG57700-01 Protein Sequence 


MKVKVI PVLEDNYMYLVI E E LTRE AVAVD VAV PKRLLE I VGREG VS LT AVLTTHHHWD 
HARGNPELARLRPGLAVLGADERI FS LTRRLAHGEELQFGAI HVRCLLTPGHTAGHMS 
YFLWEDDCPDPPALFSGGDALSVAGCGS CLEGS AQQMYQS LAELGTLP PETKVFCGHE 
HTLSNLEFAQKVE PCNDHVRAKLS WAQKRDEDDVPTVPSTLGEERLYNPFLRVAEE PV 
RKFTGKAVPADVLEALCKERARFEQAGEPRQPQARALLALQWGLLSAAPHD 




SEQ ID NO: 63 


888 bp 


NOV17b, 

CG57700-02 DNA Sequence 


CTCCGTGACCATGAAGGTCAAGGTCATCCCCGTGCTCGAGGACAACTACATGTACCTG 
GTC ATCGAGGAG CT CACGCGCGAGGCGGTGGCCGTGGACGTGGCTGTGCCCAAGAGGC 
TGCTGGAGATCGTGGGCCGGGAGGGGGTGTCTCTGACCGCTGTGCTGACCACCCACCA 
TCACTGGGACCACGCGCG^GAAACCCGGAGCTGGCGCGGCTTCGTCCCGGGCTGGCG 
GTGCTGGGCGCGGACGAGCGCATCTTCTCGCTGACGCGCAGGCTGGCGCACGGCGAGG 
AGCTGCAGTTCGGGGCCATCCACGTGCGTTGCCTCCTGACGCCCGGCCACACCGCCGG 
CCACATGAGCTACTTCCTGTGGGAGGACGATTGCCCGGACCCACCCGCCCTGTTCTCG 
GGCGACGCGCTGTCGGTGGCCGGCTGCGGCTCGTGCCTGGAGGGCAGCGCCCAGCAGA 
TGTAC CAGAG CCTGG CCGAGCTGGGT ACCCTGCCCCC CGAG ACG AAGGTGTTCTGCGG 
CCACGAGCACACACTTAGCAACCTGGAGTTTGCCCAGAAAGTGGAGCCCTGCAACGAC 
CACGTGAGAGCCAAG CTGTCCTGGG CTAAGAAG AGGG ATG AGGATG ACGTGCCC ACTG 
TG CCGTCGACTCTGGGCGAGGAGCG CCTCTACAACC CCTTCCTGCGGGTGGCAGAGGA 
GCCGGTGCGCAAGTTCACGGGCAAGGCGGTCCCCGCCGACGTCCTGGAGGCGCTATGC 
AAGGAGCGGGCGCGCTTCGAACAGGCGGGCGAGCCGCGGCAGCCACAGGCGCGGGCCC 
TCCTTGCGCTGCAGTGGGGGCTCCTGAGTGCAGCCCCACACGACTGAGCCACCCAGAC 
CCTCACAGGGCTGGGCCT 




ORF Start: ATGat 11 


ORF Stop: TGA at 857 




SEQ ID NO: 64 


282 aa 


MW at31205.3kD 


NOV17b, 

CG57700-02 Protein Sequence 


MKVXVI PVLEDNYMYLVI EELTREAVAVDVAVPKRLLE I VGREGVSLTAVLTTHHHWD 
HARGNPE LARLRPGLAVLGADER I FSLTRRLAHGEELQFGA I HVRCLLTPGHTAGHMS 
YFLWEDDCPDPPALFSGDALSVAGCGSCLEGSAQQMYQSLAELGTLPPETKVFCGHEH 
TLSNLEFAQKVEPCNDHVRAKLSWAKKRDEDDVPTVPSTLGEERLYNPFLRVAEEPVR 
KFTGKAVPADVLEALCKERARFEQAGE PRQ PQARALLALQWGLLS AAPHD 




SEQ ID NO: 65 


882 bp 


NOV 17c, 

CG57700-03 DNA Sequence 


ACCATGAAGGTCAAGGTCATCCC CGTG CTCGAGGACAACTACATGT ACCTGGTCATCG 
AGGAG CTC ACGCGCGAGGCGGTGGCCGTGGACGTGGCTGTGCC CAAG AGGCTGCTGG A 
GATCGTOKCCGGGAGGGGGTGTCTCTGACCGCTGTGCTGACCACCCACCATCACTGG 
GACCACGCGCGGGGAAACCCGGAGCTGGCGCGGCTTCGTCCCGGGCTGGCGGTGCTGG 
GCGCGGACGAGCGCATCTTCTCGCTGACGCGCAGGCTGGCGCACGGCGAGGAGCTGCG 
GTTCGGGGCCATCCAOSTGCGTTGCCTCCTGACGCCCGGCCACACCGCCGGCCACATG 
AGCTACTTCCTGTGGGAGK^CGATTGCCCGGACCCACCCGCCCrGTTCTCGGGCGACG 
CGCTGTCGGTGGCCGGCTGCGGCTCGTGCCTGGAGGGCAGCGCCCAGCAGATGTACCA 
GAGCCTGGCCGAGCTGGGTACCCTGCCCCCCGAGACGAAGGTGTTCTGCGGCCACGAG 
CACACGCTTAGCAACCTGGAGTTTGCCCAGAAAGTGGAGCCCTGCAACGACCACGTGA 
GAGCCAAGCTGTCCTGGGCTAAGAAGAGGGATGAGGATGACGTGCCCACTGTGCCGTC 
G ACTCTGGGCGAGGAGCGCCTCT ACAACC CCTT CCTGCGGGTGGCAGAGG AGCCGGTG 
CGCAAGTTCACGGGCAAGG CGGTCCCCGCCGACGTCCTGG AGG CG CTATGCAAGGAGC 
GGG CGCGCTCCGAACAGGCGGGCGAGCCGCGGCAGCCACAGGCGCGGGCCCTCCTTGC 
GCTGCAGTGGGGGCTCCTGAGTGCAGCCCCACACGACTGAGCCACCCAGACCCTCACA 
GGGCTGGGGCCT 




ORF Start: ATG at 4 


ORF Stop: TGA at 850 




SEQ ID NO: 66 


282 aa 


MWat31173.2kD 


NOV 17c, 

CG57700-03 Protein Sequence 


MKVKVI PVLEDNYMYLVI EELTREAVAVDVAVPKRLLE I VGREGVSLTAVLTTHHHWD 
HARGNPE LARLRPGLAVLGADER I FSLTRRLAHGEELRFGAI HVRCLLTPGHTAGHMS 
YFLWEDDCPDPPALFSGDALSVAGCGS CLEGS AQQMYQS LAELGT LP PETKVFCGHEH 
TLSNLEFAQKVEPCNDHVRAKLSWAKKRDEDDVPTVPSTLGEERLYNPFLRVAEEPVR 
KFTGKAVPADVLEALCKERARSEQAGE PRQ PQARALLALQWGLLS AAPHD 




SEQ ID NO: 67 


855 bp 


NOV17d, 

CG57700-04 DNA Sequence 


ACCATGAAGGTCAAGGTCATCCCCGTGCTCGAGGACAACTACATGTACCTGGTCATCG 
AGGAGCTCACGCGCGAGGCGGTGGCCGTGGACGTGGCTGTGCCCAAGAGGCTGCTGGA 
GATCGTGGGCCGGGAGGGGGTGTCTCTGACCGCTGTGCTGACCACCCACTATCACTGG 
GACCACGCGCGGGGAAACCCGGAGCTGGCGCGGCTTCGTCCCGGGCTGGCGGTGCTGG 
GCGCGGACGAGCGCATCTTCTCGCTGACGCGCAGGCTGGCGCACGGCGAGGAGCTGCG 
GTTCGGGGCCATCCACGTGCGTTGCCTCCTGACGCCCGGCCACACCGCCGGCCACATG 
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AGCTACTTCCTGTGGGAGGACGATTGCCCGGACCCACCCGCCCTGTTCTCGGGCGACG 
CGCTGTCGGTGGCCGG CTGCGGCTCGTGCCTGGAGGGCAG CG CC CAGCAGATGT ACCA 
GAGCCTGGCCGAGCTGGGTACCCTGCCCCCCGAGACGAAGGTGTTCTGCGGCCACGAG 
CACACGCTTAGCAACCTGGAGTTTGCCCAGAAAGTGGAGCCCTGCAACGACCACAAGA 
GGGATGAGGATGACGTGCCCACTGTGCCGTCGACTCTGGGCGAGGAGCGCCTCTACAA 
CCCCTTCCTGCGGGTGGCAGAGGAGCCGGTGCGCAAGTTCACGGGCAAGGCGGTCCCC 
GCCGACGTCCTGGAGGCGCTATGCAAGGAGCGGG CG CGCTTCG AACAGGCGGG CGAGC 
CGCXSGCAGCC^CAGGCGCGGGCCCTCCTTGCGCTGCAGTGGGGGCTCCTGAGTGCAGC 
CCCACACGACT GAGCCACC CAGACCCT C ACAGGG CTGGGGCCT 




ORF Start: ATG at 4 


ORF Stop: TGA at 823 




SEQ ID NO: 68 


273 aa MW at 30219.1kD 


NOV17d, 

CG57700-04 Protein Sequence 


MKVKVIPVLEDNYMYLVIEELTREAVAVDVAVPKRLLEIVGREGVSLTAVLTTHYHWD 
HARGNPELARLRPGLAVLGADERIFSLTRRLAHGEELRFGAIHVRCLLTPGHTAGHMS 
YFLWEDDCPDPPALFSGDALSVAGCGSCLEGSAQQMYQSLAEIjGTLPPETKVFCGHEH 
TLSNLEFAQKVEPCNDHKRDEDDVPTVPSTLGEERLYNPFLRVAEEPVRKFTGKAVPA 
DVLEALCKERARFEQAGEPRQPQARALLALQWGLLSAAPHD 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 17B. 



Table 17B. Comparison of NOV17a against NOV17b through NOV17d. 


Protein Sequence 


NOV17a Residues/; 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV17b 


1..283 j 
1..282 ! 


281/283 (99%) 
282/283 (99%) 


NOV17c 


1..283 

1..282 i 


279/283(98%) 
281/283 (98%) 


NOV17d 


1..283 
1..273 


271/283 (95%) 
273/283 (95%) 



Further analysis of the NOV17a protein yielded the following properties shown in 
Table 17C. 



Table 17C. Protein Sequence Properties NO VI 7a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1682 probability located in lysosome (lumen); 0.1000 
probability located in mitochondrial matrix space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV 17a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 17D. 
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Table 17D. Geneseq Results for NOV17a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV17a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAW80783 


Human bisphosphonate binding 
protein, DPI (hDPl) - Homo sapiens, 
260 aa. [WO9836064-A1, 20-AUG- 
1998] 


1..256 
1..256 


128/257(49%) 
184/257(70%) 


6e-72 


AAG10987 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 9531 - Arabidopsis 
thaliana, 258 aa. [EP1033405-A2, 06- 
SEP-2000] 


1..245 
1..246 


107/248(43%) ; 
160/248 (64%) 


5e-53 


AAG10986 


Arabidopsis thaliana protein fragment \ 
SEQ ID NO: 9530 - Arabidopsis i 
thaliana, 268 aa. [EP1033405-A2, 06- 
SEP-2000] 


I. .245 I 

II. . 256 


107/248 (43%) 
160/248 (64%) 


5e-53 


AAM78721 


Human protein SEQ ID NO 1 383 - 
Homo sapiens, 385 aa. 
[WO200157190-A2,09-AUG-2001] j 


1..226 
119..344 j 


100/227 (44%) 
135/227 (59%) 


6e-45 


AAY71110 


Human Hydrolase protein-8 
(HYDRL-8) - Homo sapiens, 361 aa. ! 
[WO200028045-A2, 18-MAY-2000] j 


1..226 
95.. 320 j 


100/227 (44%) 
135/227 (59%) 


6e-45 



In a BLAST search of public sequence databases, the NOV 17a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 17E. 
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Table 17E. Public BLASTP Results for NOV17a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV17a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9BT45 


SIMILAR TO RIKEN CDNA 
1500017E18 GENE - Homo sapiens 
(Human), 282 aa. 


1..283 
1..282 


280/283 (98%) 
282/283 (9S%) 


e-163 


Q9DB32 


1500017E18RIK PROTEIN - Mus 
musculus (Mouse), 283 aa. 


1..278 

1..278 ; 


231/279 (82%) 
251/279 (89%) ; 


e-133 


Q96S11 


SIMILAR TO HAGH - Homo sapiens 
(Human), 218 aa. 


1..228 i 
1..218 i 


217/228 (95%) 
218/228 (95%) i 


e-123 


Q96MR5 


CDNA FLJ30279 FIS, CLONE 
BRACE2002772, MODERATELY 
SIMILAR TO 

HYDROXY ACYLGLUTATHIONE 
HYDROLASE (EC 3.1.2.6) - Homo 
sapiens (Human), 202 aa. 


1..133 \ 
1..133 i 


132/133 (99%) 
133/133 (99%) 


3e-73 


035952 


Hydroxyacylglutathione hydrolase (EC 
3.1.2.6) (Glyoxalase II) (Glx II) (Round 
spermatid protein RSP29) - Rattus 
norvegicus (Rat), 260 aa. 


1..256 ! 
1..256 ! 


128/257 (49%) 
184/257 (70%) 


le-71 



PFam analysis predicts that the NOV 17a protein contains the domains shown in the 
Table 17F. 



Table 1 7F, Domain Analysis of NO VI 7a 


Pfam Domain 


NO VI 7a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


lactamase B: domain 1 of 
1 


7..173 


55/221 (25%) 
129/221 (58%) 


5.8e-32 



Example 18. 



The NOV 18 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 1 8A. 



Table 18A. NOV18 Sequence Analysis 
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SEQIDNO: 69 


2109 bp 


NOV18a, 

CG58553-01 DNA Sequence 


GGGTCCGGCGGGCATCGGCAAGACCATOGCGGCCAAA^TATCCTGTACGACTGGGCG 


GCGGGCAAGCTGTACCAGGGCCAGGTGGACTTCGCCTTCTTCATGCCCTGCGGCGAGC 
TGCTGGAGAGGCCGGGCACGCGCAGCCTGGCTGACCTGATCCTGGACCAGTGCCCCGA 
CCGCGGCGCGCCGGTGCCGCAGATGCTGGCCCAGCCGCAGCGGCTGCTCTTCATCCTG 
GACGGCGCGGACGAGCTGCCGGCGCTGGGGGGCCCCGAGGCCGCGCCCTGCACAGACC 
CCTTCGAGGCGG CGAGCGGCGCGCGGGTG CT AGGCGGGCTGCTG AGTAAGGCGCTGCT 
GCCCACGGCCCTCCTGCTGGTGACCACGCGCGCCGCCGCCCCCGGGAGGCTGCAGGGC 
CGCCTGTGTT CC CCGC AGTGCG CCGAGGTG CGCGG CTT CTCCGACAAGGACAAGAAG A 
AGTATTTCTACAAGTTCTTCCGGGATGAGAGGAGGGCCGAGCGCGCCTACCGCTTCGT 
GAAGGAGAACGAGACGCTGTTCGCGCTGTGCTTCGTGCCCTTCGTGTGCTGGATCGTG 
TGCACCGTGCTGCGCCAGCAGCTGGAGCTCGGTCGGGACCTGTCGCGCACGTCCAAGA 
CCACCACGTCAGTGTACCTGCTTTTCATCACCAGCGTTCTGAGCTCGGCTCCGGTAGC 
CGACGGGCCCCGGTTGCAGGGCGACCTGCGCAATCTGTGCCGCCTGGCCCGCGAGGGC 
GTCCTCGGACGCAGGGCGCAGTTTGCCGAGAAGGAACTGGAGCAACTGGAGCTTCGTG 
GCTCCAAAGTG CAGACGCTGTTTCT CAGCAAAAAGGAG CTGCCGGGCGTGCTGGAGAC 
AGAGGTCACCTACCAGTTCATCGACCAGAGCTTCCAGGAGTCCTTCGCGGCACTGTCC 
TACCTGCTGGAGGACGGCGGGGTGCCCAGGACCGCGGCTGGCGGCGTTGGGACACTCC 
TGCGTGGGGACGCCCAGCCGCACAGCCACTTGGTGCTCACCACGCGCTTCCTCTTCGG 
ACTGCTGAGCGCGGAGCGGATGCGCGACATCGAGCGCCACTTCGGCTGCATGGTTTCA 
GAGCGTGTGAAGCAGGAGGCCCTGCGGTGGGTGCAGGGACAGGGACAGGGCTGCCCCG 
GAGTGGCACCAGAGGTGACCGAGGGGGCCAAAGGGCTCGAGGACACCGAAGAGCCAGA 
GGAGGAGGAGGAGGGAGAGGAGCCCAACTACCCACTGGAGTTGCTGTACTGCCTGTAC 
GAGACGCAGGAGGACGCGTTTGTGCGCCAAGCCCTGGGCCGGTTCCCGGAGCTGGCGC 
TG CAGCGAGTG CGCTTCTG CCGCATGG ACGTGGCTGTTCTGAGCTACTG CGTGAGGTG 
CTGCCCTGCTGCACAGGCACTGCGGCTGATCAGCTGCAGATTGGTTGCTGCGCAGGAG 
AAGAAGAAGAAGAGCCTGGGGAAGCGGCTCCAGGCCAGCCTGGGCACCACAAAACAAC 
TGCCAGCCTCCCTTCTTCATCCACTCTTTCAGGCAATGACTGACCCACTGTGCCATCT 
GAGCAGCCTCACGCTGTCCCACTGCAAACTCCCTGACGCGGTCTGCCGAGACCTTTCT 
GAGGCCCTGAGGGCAGCCCCCGCACTGACGGAGCTGGGCCTCCTCCACAACAGGCTCA 
GTGAGGCAGGACTGCGTATGCTGAGTGAGGGCCTAGCCTGGCCGCAGTGCAGGGTGCA 
GACGGTCAGGGTACAGCTGCCTGACCCCCAGCGAGGGCTCCAGTACCTGGTGGGTATG 
CTTCGGCAGAGCCCTGCCCTGACCACCCTGGATCTCAGCGGCTGCCAACTGCCCGCCC 
CCATGGTGACCTACCTGTGTGCAGTCCTGCAGCACCAGGGATGCGGCCTGCAGACCCT 
CAGTCTGGCCTCTGTGGAGCTGAGCGAGCAGTCACTACAGGAGCTTCAGGCTGTGAAG 
AGAGCAAAGCCGGATCTGGTCATCACACACCCAGCGCTGGACGGCCACCCACAACCTC 
CCAAGGAACTCATCTCGACCTTCTGAGGCTCTGGTGGCCAGAGCAGGGTGGAAGACCC 




TAGTCAAAGTCCCTGTGGAGA 








ORF Start: ATG at 26 


ORF Stop: TGA at 2054 




SEQ ID NO: 70 


676 aa 


MW at 74650.3kD 


NOV18a, 

CG58553-01 Protein Sequence 


MAAKNILYDWAAGKLYQGQVDFAFFMPCGELLERPGTRSLADLILDQCPDRGAPVPQM 
LAQPQRLLFIUX3ADELPALGGPEAAPCTDPFEAASGARVLGGLLSKALLPTALLLVT 
TRAAAPGRLQGRLCSPQCAEVRGFSDKDKKKYFYKFFRDERRAERAYRFVKENETLFA 
LCFVPFVCWIVCTVLRQQLEI/3RDLSRTSKTTTSVYLLFITSVLSSAPVADGPRLQGD 
LRNLCRIiAREGVLGRRAQFAEKELEQLELRGSKVQTLFIiSKKELPGVLETEVTYQFID 
QSFQESFAALSYLLEDGGVPRTAAGGVGTLLRGDAQPHSHLVLTTRFLFGLLSAERMR 
DIERHFGCMVSERVKQEALRWVQGQGQGCPGVAPEVTEGAKGLEDTEEPEEEEEGEEP 
NYPLELLYCLYETQEDAFVRQALGRFPELALQRVRFCRMDVAVLSYCVRCCPAAQALR 
L I SCRLVAAQE KKKKSLGKRLQASLGTTKQLPASLLHPLFQAMTD PLCHLS SLTLSHC 
KL PD A VC RD LS E A LRAA P ALT E LG L LHNR L S EAG LRMLS EG LAW P QCR VQTVR VQ L P D 
PQRGLQYLVGMLRQS PALTTLDLSGCQLPAPMVTYLCAVLQHQGCGLQTLS LASVELS 
EQSLQELQAVKRAKPDLVITHPALDGHPQPPKELISTF 



Further analysis of the NOV18a protein yielded the following properties shown in 
Table 18B. 
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Table 18B. Protein Sequence Properties NO VI 8a 


Psort 
analysis: 


0.7400 probability located in nucleus; 0.6000 probability located in endoplasmic 
reticulum (membrane); 0.3000 probability located in microbody (peroxisome); 
0.1000 probability located in mitochondrial inner membrane 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV 18a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 18C. 



Table 1 8C. Geneseq Results for NOV1 8a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NO VI 8a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAE04546 


Human G-protein coupled receptor-2 
(GCREC-2) protein - Homo sapiens, 
891 aa. [WO200142288-A2, 14-JUN- 
2001] 


1..676 
210..891 


671/682(98%) 
671/682(98%) j 


0.0 


AAU00023 


Human activated T-lymphocyte 
associated sequence 2, ATLAS-2 - 
Homo sapiens, 1851 aa. 
[WO2001 14564-A2, 01-MAR-2001] 


1..633 
210..904 


605/695(87%) : 
610/695(87%) 


0.0 


ABB11735 


Human vasopressin receptor 
homologue, SEQ ID NO:2105 - Homo 
sapiens, 597 aa. [WO200157188-A2, 
09-AUG-2001] 


1..490 
106..595 


485/490(98%) j 
485/490(98%) 


0.0 


AAR33389 


AII/AVPv2 receptor - Synthetic, 481 
aa. [WO9305073-A, 18-MAR-1993] 


193..670 
1..480 


322/481 (66%) | 
371/481 (76%) j 


e-174 


AAM89960 


Human immune/haematopoietic 
antigen SEQ ID NO: 1 7553 - Homo 
sapiens, 329 aa. [WO200157182-A2, 
09-AUG-2001] 


1..274 
9..282 


265/274(96%) 
266/274(96%) 


e-151 



In a BLAST search of public sequence databases, the NOV 18a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 18D. 
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Table 18D. Public BLASTP Results for NOV18a 


Protein 
Accession 
Number 


Protei n/Orj*an ism/I .enpth 


NOV18a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


CAC34689 


SEQUENCE 3 FROM PATENT 
WO01 14564 - Homo sapiens 
(Human), 1851 aa. 


1..633 
210.904 


605/695 (87%) 
610/695 (87%) 


0.0 


Q91WS2 


HYPOTHETICAL 62.5 KDA 
PROTEIN - Mus musculus 

fMousel 556 aa ffrafrment^ 


107..659 
1..554 


390/557 (70%) 
450/557 (80%) 


0.0 


Q63035 


VASOPRESSIN RECEPTOR - 
Rattus norvegicus (Rat), 483 aa. 


193. .670 
1..482 


324/483 (67%) 
372/483 (76%) 


e-173 


AAL12498 


CRYOPYR1N - Homo sapiens 
(Human), 920 aa. 


3..657 
234..914 


232/709(32%) 
355/709 (49%) 


5e-94 


AAL12497 


CRYOPYRIN - Homo sapiens 
(Human), 1034 aa. 


3..648 
234..848 


223/658 (33%) 
344/658(51%) 


6e-93 



PFam analysis predicts that the NOV 18a protein contains the domains shown in the 
Table 18E. 



Table 18E. Domain Analysis of NOV 18a 



Pfam Domain 



NO VI 8a Match Region 



Identities/ 
Similarities 
for the Matched Region 



Expect Value 



No Significant Matches Found 



Example 19. 

The NOV 19 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 19 A. 



Table 19A. NOV19 Sequence Analysis 




SEQIDNO:71 2686 bp 


NOV19a, 

CG58626-01 DNA Sequence 


CCGG CGGCGTCTCCACAGCATQAATTACC CGGGCCGCGGGTCCCCACGG AG CCCCGAG 
CAT AACGG CCG AGGCGGCGGCGG CGGCGC CTGGG AGCTGGG CTCAGACG CG AGGCCAG 
CGTTCGGCGGC^TCTCTGCTGCTTCGAGCACCTGCCCGGCGGGGACCCGGACGACXJG 
CGACGTGCCCCTGGCCCTGCTGCGCGX3GGAACCCGGGCTGCATTTGGCX5CCGGGCACC 
GACGACCACAACCACCACCTCGCGCTGGACCCCTGCCTCAGTGACGAGAACTATGACT 
TCAGCTCCGCCGAGTCGGGCTCCTCGCTGCGCTACTACAGCGAGGGTGAGAGCGGCGG 
CGGCGGCAGCTCCTTGTCGCTGCACCCGCCGCAGCAGCCTCCGCTGGTCCCGACGAAC 
TCGGGGGGCGGCGGCGCGACAGGAGGGTCCCCCGGGGAAAGGAAACGTACCCGGCTTG 
GCGGCCCGGCGGCCCGGCACCGCTATGAGGTAGTGACGGAGCTGGGCCCGGAGGAGGT 
ACGCTGGTTCTACAAGGAGGACAAGAAGACCTGGAAGCCCTTCATCGGCTACGACTCG 
CTCCXSCATCGAGCTCGCCTTCCGGACCCTGCTGCAGACCACGGGTGCCO^CCCCAGG 
GCGGGGACCGGGACGGCGACCATGTGTGCTCCCCCACGGGCCCAGCCTCCAGTTCCGG 
AGAAGATG ACX3 ATG AGG ACCGCG CCTG CXX3CTTCTGCCAG AGTACGACGGGGCACGAG 
CCGGAGATGGTGGAGCTTGTGAACATCGAGCCTGTGTGCGTGCGGGGCGGCCTCTACG 
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AGGTGGATGTGACCCAAGGAGAGTGCTACCOSGTGTACTGGAACCGTGCTGATAAAAT 
ACCAGTAATGCGTGGACAGTGGTTTATTGACGGCACTTGGCAGCCTCTAGAAGAGGAA 
GAAAGTAATTTAATTGAGCAAGAACATCTCAATTGTTTTAGGGGCCAGCAGATGCAGG 
AAAATTTCGATATTGAAGTGTCAAAATCCATAGATGGAAAAGATGCTGTTCATAGTTT 
CAAGTTG AGTCGAAACCATGTGGACTGG CACAGTGTGGATGAAGT ATATCTT T ATAGT 
GATGCAACAACATCTAAAATTGCAAGAACAGTTACCCAAAAACTGGGATTTTCTAAAG 
CATCAAGTAGTGGTACCAGACTTCATAGAGGTTATGTAGAAGAAGCCACATTAGAAGA 
CAAG CCAT CAC AGACT A C CCAT ATTG T ATTTG TTGTG CATGG CATTGGG C AG AAAATG 
GACCAAGGAAGAATTATCAAAAATACAGCTATGATGAGAGAAGCTGCAAGAAAAATAG 
AAGAAAGGCATTTTTCCAACCATGCAACACATGTTGAATTTCTGCCTGTTGAGTGGCG 
GTCAAAACTTACTCTTGATGGAGACACTGTTGATTCCATTACTCCTGACAAAGTACGA 
GGTTT AAGGG AT ATG CTG AAC AG C AG TG CAAT GG ACAT AATGTATT AT ACT AGT C CAC 
TTTATAGAGATGAACTAGTTAAAGGCCTTCAGCAAGAGCTGAATCGATTGTATTCCCT 
TTTCTGTTCTCGGAATCCAGACTTTGAAGAAAAAGGGGGTAAAGTCTCAATAGTATCA 
CATTC CTTGGG ATGTG T AAT TACTT ATG ACAT AATGACTGG CTGG AATCCAG TTCGGC 
TGTATGAACAGTTGCTGCAAAAGGAAGAAGAGTTGCCTGATGAACGATGGATGAGCTA 
TGAAGAACGACATCTTCTTGATGAACTCTATATAACTAAACGACGGCTGAAGGAAATA 
GAAGAACGGCTTCACGGATTGAAAGCATCATCTATGACACAAACACCTGCCTTAAAAT 
TTAAGGTAGAGAATTTCTTCTGTATGGGATCCCCATTAGCAGTTTTCTTGGCGTTGCG 
TGGCATCCGCCCAGGAAATACraSAAGTCAAGACCATATTTTGCCTAGAGAGATTTGT 
AACCGGTTACTAAATATTTTTCATCCTACAGATCCAGTGGCTTATAGATTAGAACCAT 
TAATACTGAAACACTACAGCAACATTTCACCTGTCCAGATCCACTCGTACAATACTTC 
AAATCCTTT ACCTTATG AACATATG AAG CCAAGCTTTCTCAACC CAGCT AAAGAACCT 
ACCTC7VGTTTCAGAGAATGAAGGCATTTCAACCATACCAAGCCCTGTGACCTCACCAG 
TTTTGTCCCGCCGACACTATGGAGAATCTATAACAAATATAGGCAAAGCAAGCATATT 
AGGTG CTG CT AG C ATTGG AAAGGG ACTTGG AGG AATGTTG TT CT CAAGATTTGG ACGT 
TCATCTACAACACAGTCATCTGAAACATCAAAAGACTCAATGGAAGATGAGAAGAAGC 
CAGTTGCCTCACCTTCTGCTACCACCGTAGGGACACAGACCCTTCCACATAGCAGTTC 
TGGCTTCCTCGATTCTGCAGTGGAGTTGGATCACAGGATTGATTTTGAACTCAGAGAA 
G G CCTTG TGGAG AG C CG CT ATTGG T C AG CTG TCACG TCGCATACTGCCT ATTG G TCAT 
CCTTGGATGTTGCCCTTTTTCTTTTAACCTTCATGTATAAACATGAGCACGATGATGA 
TGCAAAACCCAATTT AG AT CCAATCTGAACTCT CTTG AAGGACATGAATGGCCT AAAA 
CTGATTTTTTTTTTTTCC 




ORF Start: ATG at 20 


ORF Stop: TGA at 2636 




SEQ ID NO: 72 


872 aa 


MW at 97063.4kD 


NOV19a, 

CG58626-01 Protein Sequence 


MNYPGRGS PRS PEHNGRGGGGGAWELGSDARPAFGGGVCCFEHLPGGDPDDGDVPLAL 
LRGEPGLHLAPGTDDHNHHLALD PCLSDENYDFS S AESGS S LRYYSEGESGGGGSSLS 
LHPPQQPPLVPTNSGGGGATGGSPGERKRTRLGGPAARHRYEVVTELGPEEVRWFYKE 
DKKTWKPF IGYDSLRI E LAFRTLLQTTGARPQGGDRDGDHVCS PTGPAS SSGEDDDED 
RACG FCQ STTGH E P EMVEL VNI E P VCVRGG L YEVDVTQGE C Y P VYWNRAD KI P VMRGQ 
WFIDGTWQPLEEEESNL IEQEHLNCFRGQQMQENFDI E VS KS I DGKDAVHS FKLS RNH 
VDWH S VD EVYL YS DATT S KI ARTVTQKLG FS KAS S S GTRLHRG YVE EAT LEDKPS QTT 
H I VFWHG IGQKMDQG R 1 1 KNTAMMREAARK I E ERH FSNHATHVE F LP VE WRS KLTLD 
GDTVDS I T PDKVRGLRDMLNS SAMDIMYYTS PLYRDELVKGLQQELHRL YS LFCSRNP 
DFEEKGGKVSIVSHSLGCVITYDIMTGWNPVRLYEQLLQKEEELPDERWMSYEERHLL 
DELYITKRRLKEIEERLHGLKASSMTQTPALKFKVENFFCMGSPLAVFLALRGIRPGN 
TGS QDHI L PRE I CNRLLN I FH PTD P VA YRLE PL I LKH YSN I S PVQ I HWYNTSN PL P YE 
HMKPS FLN PAKE PTSVSENEGISTIPSPVTS P VLS RRH YG E S I TN IGKAS I LG AAS I G 
KGLGGML FSRFGRS S TTQS S ETS KD S MED E KK P VAS PS ATTVGTQTLPH S S S G F LDS A 
VELDHRI DFELREGLVESRYWSAVTSHTAYWS S LDVALFLLTFMYKHEHDDDAKPNLD 
PI 



Further analysis of the NOV 19a protein yielded the following properties shown in 
Table 19B. 
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Table 19B. Protein Sequence Properties NOV19a 


PSort 
analysis: 


0.4555 probability located in microbody (peroxisome); 0.4500 probability 
located in cytoplasm; 0. 1 000 probability located in mitochondrial matrix space; 
0.1000 probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV 19a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 1 9C . 



Table 19C. Geneseq Results for NOV19a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV19a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG64151 


Arabidopsis thaliana gravitropism 
protein - Arabidopsis thaliana, 933 
aa. [JP2001 120279-A, 08-MAY- 
2001] 


257.. 547 
1 56.-454 


104/316(32%) 
156/316(48%) 


le-38 


AAM41595 


Human polypeptide SEQ ED NO 
6526 - Homo sapiens, 677 aa. 
[WO200153312-A1, 26-JUL-2001] 


261. .548 
52..328 


94/301 (31%) 
138/301 (45%) 


6e-25 


AAB92643 


Human protein sequence SEQ ID 
NO: 10972 - Homo sapiens, 1000 aa. 
[EP1074617-A2, 07-FEB-2001] 


119..608 
226..664 


132/524(25%) 
204/524 (38%) 


2e-24 


AAM39809 


Human polypeptide SEQ ID NO 
2954 - Homo sapiens, 615 aa. 
[WO200153312-A1, 26-JUL-2001] 


274..548 
3..266 


90/288 (31%) : 
131/288(45%) 


4e-23 


AAB93825 


Human protein sequence SEQ ID 
NO: 13636 - Homo sapiens, 694 aa. 
[EP1074617-A2, 07-FEB-2001] 


404..608 
227..449 


76/229 (33%) 
113/229(49%) 


6e-23 



In a BLAST search of public sequence databases, the NOV 19a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 19D. 
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Table 19D. Public BLASTP Results for NOV19a 


Protein 
Accession 
Number 


Protein/Orpanism/Lenpth 


NOV19a 
Residues/ 
Match 
Residues ' 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


046606 


PHOSPHATIDIC ACID- 
PREFERRING PHOSPHOLIPASE Al 
- Bos taurus (Bovine), 875 aa. 


1..872 1 
1..875 \ 


802/876 (91%) 
829/876 (94%) 


0.0 


Q9C0F8 


KIAA1705 PROTEIN - Homo sapiens 
(Human), 498 aa (fragment). 


378..872 | 
4..498 j 


493/495 (99%) 
494/495 (99%) 


0.0 


096T L2 


CDNA FI T2540R FTS CT ONF 
TST02965, HIGHLY SIMILAR TO 
BOS TAURUS PHOSPHATIDIC 
ACID-PREFERRING 
PHOSPHOLIPASE Al MRNA - Homo 
sapiens (Human), 454 aa. 


419 87? \ 
1..454 \ 


4SV4S4 fQQ%1 
454/454 (99%) 


ft ft 


AAH18552 


HYPOTHETICAL 27.3 KDA 
PROTEIN - Mus musculus (Mouse), 
249 aa (fragment). 


624..869 | 
1..246 i 


224/246(91%) 
236/246(95%) 


e-130 


AAL32232 


HYPOTHETICAL 85.1 KDA 
PROTEIN - Caenorhabditis elegans, 
753 aa. 


122..867 
11. .750 j 


255/794 (32%) 
374/794(46%) 


6e-91 



PFam analysis predicts that the NOV 19a protein contains the domains shown in the 
Table 19E. 



Table 19E. Domain Analysis of NO VI 9a 


Pfam Domain 


NOV19a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


DUF203: domain 1 of 1 


252..458 


42/219 (19%) 
105/219 (48%) 


7.5 


DDHD: domain 1 of 1 


611..858 


96/266 (36%) 
236/266 (89%) 


3.3e-116 



Example 20. 



The NOV20 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 20A. 



Table 20A. NOV20 Sequence Analysis 
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SEQ ID NO: 73 


773 bp 


NOV20a, 

CG57597-01 DNA Sequence 


GGTAAGGACACAAGATGCCAAATAGGGTAAGGAATGGTCCAGAAACCTGTGAACTCTG 
CATTGCAGGCATGCACCACCACTCCTGGCTAATTTTTTGTATTTTTAGTGCCATCGAA 
TCCGGCTCAAACCTTTTATTT CTCTTATGTAAAAGCTGTGTACTT C AG AAAAACATGT 
ACAGTTATCCCTGGCAGTGCCGGGGTGGGGTCTGCGCGGCCCTGGAGGCCTGGCCGGC 
CTTGCAGATCGCTGTGGAGAATGGCTTCGGGGGTGTGCACAGCCAGGAGAAGGCCAAG 
TGGCTGGGGGGTGCAGTGGAGGATTACTTCATGCGCAATGCTGACTTGGAGCTAGATG 
AGGTGGAAGACTTCCTTGGAGAGCTGTTGACCAACGAGTTTGATACAGTTGTGGAAGA 
CGGGAGTCTGCCCCAGGTGAGCCAGCAACTGCAGACCATGTTCCACCACTTCCAGAGG 
GG TG ATGGGGCTGCTCTGAGGG AGATGGC CTCCTGCATCACTCAGAGAAAATGCAAGG 
TCACAGCCACTGCACTTAAGACAGCTAGAGAGACTGATGAGGATGAAGATGATGTGGA 
CAGTGTGGAAGAGATGGAGGTCACAGCTACGAATGATGGGGCTGCTACAGATGGGGTC 
TGCCCCCAGCCTGAACCCTCTGATCCAGACGCTCAGACTATTAAGGAAGAGGATATAG 
TGGAAGATGGCTGGACCATTGTCCGGAGAAAAAAATGAGTGGGGATGATTGGAAATGG 
CTTTGGGCCCTTATTTGCT 




ORF Start: ATG at 15 


ORF Stop: TGA at 732 




SEQ ID NO: 74 


239 aa 


MW at 26579.5kD 


NOV20a, 

CG57597-01 Protein Sequence 


MPNRVRNGPETCELCIAGMHHHSWLIFCIFSAIESGSNLLFLLCKSCVLQKNMYSYPW 
QCRGGVCAALEAWPALQIAVENGFGGVHSQEKAKWLGGAVEDYFMRNADLELDEVEDF 
LGELLTNEFDTWEDGSLPQVSQQLQTMFHHFQRGDGAALREMASCITQRKCKVTATA 
LKTARETDEDEDDVBSVEEMEVTATNDGAATDGVCPQPEPSDPDAQTIKEEDIVEDGW 
TIVRRKK 



Further analysis of the NOV20a protein yielded the following properties shown in 
Table 20B. 



Table 20B. Protein Sequence Properties NOV20a 


PSort 
analysis: 


0.3000 probability located in nucleus; 0.1000 probability located in 
mitochondrial matrix space; 0.1000 probability located in lysosome (lumen); 
0.0000 probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV20a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 20C. 



Table 20C. Geneseq Results for NOV20a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV20a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG81374 


Human AFP protein sequence SEQ ID 
NO: 266 - Homo sapiens, 191 aa. 
[WO200129221-A2, 26-APR-2001] 


61..239 
13..191 


178/179(99%) 
178/179(99%) 


e-101 


AAG57770 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 74486 - Arabidopsis 


63..239 
18..178 


56/182(30%) 
94/182(50%) 


le-13 
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SEP-2000] 








AAG57771 


AmHiHnnQiQ th a liana nrntpin frckcrm&nt 

niAUiKlyJ^Jolo UKVliCUld UiULClil IXaglHCIlL 

SEQ ID NO: 74487 - Arabidopsis 
thaliana, 156 aa. [EP1033405-A2, 06- 
SEP-2000] 


1..150 


J LI 1 / I (jV/o) 

89/171 (51%) 


ze-l 1 



In a BLAST search of public sequence databases, the NOV20a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 20D. 



Table 20D. Public BLASTP Results for NOV20a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV20a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q969E8 


UNKNOWN (PROTEIN FOR 
MGC:20451) (PROTEIN FOR 
IMAGE:3953868) - Homo sapiens 
(Human), 191 aa. 


61. .239 
13..191 


178/179(99%) 
178/179 (99%) 


e-101 


Q9NAD8 j 


Y5 1H4A. 15 PROTEIN - 
Caenorhabditis elegans, 225 aa. 


1..239 
1..225 


66/239 (27%) 
122/239 (50%) 


5e-23 


Q06672 


HIGHLY ACIDIC C-TERMDSfUS - 
Saccharomyces cerevisiae (Baker's 
yeast), 249 aa. 


63..238 
79..244 


46/177 (25%) 
82/177(45%) 


5e-ll 


Q9VBI0 


CG14543 PROTEIN - Drosophila 
melanogaster (Fruit fly), 195 aa. 


71..238 
24.. 195 


49/174(28%) 
81/174(46%) 


2e-10 


Q9UUA9 


HYPOTHETICAL HIGHLY ACIDIC 
C-TERMINUS PROTEIN - 
Schizosaccharomyces pombe (Fission 
yeast), 179 aa. 


70..239 
22..178 


42/172(24%) 
83/172 (47%) 


2e-06 



PFam analysis predicts that the NOV20a protein contains the domains shown in the 
Table 20E. 



Table 20E. Domain Analysis of NOV20a 



Pfam Domain 



NOV20a Match Region 



Identities/ 
Similarities 
for the Matched Region 



Expect Value 



No Significant Matches Found 
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Example 21. 

The NOV21 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 21 A. 



Table 21A. NOV21 Sequence Analysis 




SEQIDNO:75 . 


7741 bp 


NOV21a, 

CG57804-01 DNA Sequence 


TTGTCTCTTTGTGTTTTCCAGACATTCTAAGTGAGACTGTCCACATCATCTAGGAAAA 


TGGTGGCCCTGTCCTTAAAGATTTGTGTGCGCCACTGCAACGTGGTGAAGACCATGCA 
GTTTGAACCATCTACAGCTGTGTACGATGCGTGTCGAGTCATTCGGGAACGGGTGCCT 
GAGGCACAAACTGGGCAAGCTTCTGACTATGGACTCTTTCTTTCGGATGAAGACCCGA 
GG AAAGGG ATTTGGCTGG AAG CGGGCAGAACACTGGATTACTACATGTTG CGG AATGG 
GG ATATTTTGGAAT ATAAAAAGAAACAGAGAC CTCAGAAAATCCGGATG CTGGATGG A 
TCTGTGAAGACAGTGATGGTGGATGATTCCAAGACTGTGGGGGAGCTCCTGGTCACTA 
TTTGTAGCAGAATAGGAATAACAAATTATGAAGAATACTCCTTAATCCAAGAAACTAT 
TGAAGAAAAGAAAGAGGAAGGAACGGGCACACTCAAAAAAGACAGGACACTGTTACGA 
G ATG AGAGGAAAATGGAGAAGTTG AAGGC CAAG CTG CACACAGATGATGACCTAAATT 
GGCTGGATCACAGCCGAACATTCAGAGAACAAGGAGTAGATGAAAACGAAACGTTGCT 
G CTT AGACGGAAGTT CTTTT ACTCTGATC AG AATGTAGATTCGAG AGACCCCGTGCAG 
CTG AACTTGCTTTATGTTCAGG CACGGGATG AC ATCCTG AATGG CTCTCACCCTGT CT 
CCTTCGAGAAAGCTTGTG AGTTTGGTGGATTTC AAGCC CAG AT ACAATTTGGAC CTCA 
TGTGGAACATAAACACAAACCTGGATTTTTAGATCTGAAGGAATTCCTGCCCAAAGAA 
TATATCAAGCAGAGAGGAGCTGAAAAGAGGATCTTTCAGGAGCATAAGAACTGCGGAG 
AGATGAGTGAGATAGAAGCCAAGGTCAAGTACGTCAAACTCGCACGGTCCCTCCGCAC 
ATATGGCGTGTCCTTCTTCCTGGTGAAGGAGAAGATGAAAGGCAAGAACAAGCTGGTG 
CCTCGCCTGCTGGGGATCACCAAAGACTCGGTGATGCGCGTGGATGAGAAGACCAAGG 
AAGTGCTGCAGGAGTGGCCCCrrCACCACCGTCAAGCGCTGGGCAGCCTCACCCAAGAG 
CTT CAC ACTGGATTTTG GGG AG T ATCAGG AAAG CT A CTATT CAGT ACAAACCAC CGAG 
GGAGAGCAGATATCCCAGCTGATTGCAGGCTACATTGACATCATCCTXaAAAAAGGGAA 
CATACGTGACATCTCTGGGGTCTCCTCATTGCACTCCACATGGCTGGTGTTCTCTCAG 
TGACCAAACCACTTTTCCCC^CAGGTCCACCATCrTGCAGCAGCAGTTCAACCGGACC 
GGGAAGGCAGAGCACGGCTCAGTGGCGCTGCCGGCCGTGATGCGCTCGGGCTCCAGCG 
GG CCTG AG AC CTT CAACGTTGGC AG CATG C C CT CG CCA C AG CAG C AGGT CATGG TTGG 
G CAGATGCAC CGAGGCCACATGCCG CCACTG ACCTCAGCCCAG CAGGCCCTGATGGGG 
AC CATC AAC A CAAG CATG C ACG C CG TC CAG CAGG C C C AGG ATG ATCTCAGTG AG CT CG 
ACTCGCTGCCACCTCTCGGCCAGGATATGGCATCTAGGGTATGGGTTCAGAACAAAGT 
CGACGAATCCAAACACGAAATCCATTCTCAAGTTGATGCTATCACGGCCGGAACGGCT 
TCAGTTGTTAACCTCACAGCTC^TGACCCTGCAGACACTGACTACACAGCTGTGGGAT 
GTGCGATCACCACTATTTCTTCCAACCTGACGGAGATGTCCAAGGGTGTGAAGCTATT 
GGCCGCCCTCATGGATCATGAGGTX^C^GCGGGGAGGACTTGCTCAGAGCTGCCAGG 
ACCCTCGCTGGGGCGGTGTCAGACTTGCTGAAAGCTGTGCAGCCTACTTCTGGAGAGC 
CTCGACAGACAGTTTTGACTGCTGCTGGCAGCATCGGACAAGCCAGTGGGGATCTTCT 
G AG AC AG ATTGG AG AG AATG AG ACTG ATG AG CG ATT C CAGG ATG TTTT AATG AG TTTG 
GCCAAAGCTGTTGCCAATGCAGCTGCCATGTTGGTACTAAAGGCAAAGAATGTTGCCC 
AAGTGGCCGAAGACACTGTCCTACAGAA(^GGGTAATTGCTGCTGCCACCCAGTGTGC 
CCTCTCCACCTCCCAGCTTGTGGCATGTGCCAAGGTTGTGAGCCCCACTATTAGCTCC 
CCTGTGTCCCAGGAGCAGCTCATTX3AAGCAGGGAAGCTGGTCGACCGCTCGGTGGAGA 
ACTGTGTCCGTGCCTGCCAGGCGGCCACTACCGATAGTGAGCTCCTGAAGCAGGTCAG 
CGCAGCGGCCAGCGTGGTCAGCCAGGCCCTC CATG ATCTCCTG CAG CATGTGCGGCAG 
TTTGCCAGCCGAGGCGAGCCCATCGGCCGCTACGACCAGGCTACTGACACCATCATGT 
GTGTCACCGAG AG CATCTTCAGCTCCATGGGTGACGCTGGTGAAATGGTGCGCCAGG C 
GCGGGTTCTGGCCCAAGCCACATCAGACCTCGTCAATGCCATGAGGTCAGATGCAGAA 
G C C G AAAT CGACATGG AGAATT CAAAG AAG CTC CTGG CAGCAG CAAAAC TCTT AG CTG 
ACTCCACTGCTCGCATGGTGGAAGCTGCAAAGGGGGCTGCAGCCAACCCAGAGAATGA 
GG AC CAG CAG CAAAGGCTG AG AG AAGCTGC AG AAGGC CTCCGGGTAGC AACCAACGCA 
GCTGCCCAGAATG CTATT AAGAAAAAAATTGTCAACCGACTGGAGGTTGC7VGCCAAGC 
AGGCCGCAGCGGCAGCCACACAGACCATCGCCGCCTCCCAGAATGCAGCTGTTTCCAA 
CAAGAACCCTGCGGCCCAGCAGCAGCTGGTCCAGAGTTGCAAGGCAGTGGCTGATCAC 
ATCCCTCAGCTGGTCCAGGGAGTGAGGGGGAGCCAAGCTCAAGCTGAAGACCTGAGTG 
CCCAGCTGGCTCTCATCATCTCCAGCCAGAACTTCCITCCAGCCTGGAAGCAAGATGGT 
GTCCTCTGCCAAAGCCGCAGTCCCCACCGTGAGTGACCAGGCCGCAGCCATGCAGCTG 
AGCCAGTGTGCCAAGAACCTCGCCACCAGCTTGGCJGGAGCTGCGTACCGCCTCGCAGA 
AGGCCCATGAAGCTTGTGGTCCGATGGAAATCGATTCAGCTCTGAATACGGTGCAGAC 
G CTT AAG AATG AACTGCAGG ATG CCAAG ATGGCAGCCGTGGAG AGCCAG CTG AAGCCA 
CTTCCAGGGG AAACG CTGG AAAAATGTG CT CAGG ACCTGGG AAGCACAT CCAAGGCGG 
TGGGCTCCTCCATGG CACAGCTG CTGAC CTGTG CTGCTCAAGG CAACGAACACTACAC 
AGGGGTGGCTGCTAGAGAGACGGCCCAAGCTCTCAAAACACTGGCCCAGGCCGCCCGT 
GGAGTGGCTGCATCGACAACCGACCCCGCGGCCGCCCATGCCATGTTAGATTCTGCTC 
G AG ACGTGATGG AGGGCTCCGCCATG CTCATTCAAGAGG CCAAGCAGGCCCTGATTGC 
ACCTGG AGATGCAGAGCGTCAACAAAG ACTGGCTCAGGTGG CT AAAG CCGTCTCACAC 
TCCTTGAATAACTGCGTAAATTGCCTCCCTGGGCAGAAGGATGTGGACGTGGCCTTGA 
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AGAGCATCGGGGAGTCCAGCAAGAAGCTGCTTGTGGATTCGCTACCTCCAAGCACGAA 
GCCTTTCCAGGAAGCCCAGAGTGAACTGAACCAGGCAGCAGCTGATCTGAACCAGTCT 
GCTGGGGAAGTGGTCCATGCCACC CGGGGCCAGAGTGG AG AGTTGG CTGCAG CCTCTG 
GAAAGTTCAGTGATGATTTTGGTGAATTCCTCGATGCTGGCATTGAGATGGCTGGCCA 
AGCTCAGACAAAAGAAGACCAGATCCAAGTGATAGGGAACCTCAAGAATATCTCGATG 
GC ATCCAG CAAGCTGCTGTT AG CTGCCAAGTCTCTCTCTGTAG ATCCAGGAGCTCCCA 
ATGCGAAAAATCTCCTGGCTGCAGCTGCAAGAGCTGTGACAGAGAGCATCAATCAACT 
CATCACTCTGTGTACCCAACAAGCTCCGGG CCAG AAAGAGTG CGATAATG CCCTGCGG 
GAGCTCGAGACTGTGAAGGGGATGTTGGACAATCCTAATGAACCTGTTAGTGACCTCT 
CrTACTTTGACTGCATTGAGAGTGTGATGGAAAACTCCAAGGTTCTGGGTGAATCGAT 
GGCAGGGATTTCACAGAATGCCAAGACCGGAGACCTCCCTGCCTTTGGGGAATGTGTG 
GGGATTGCAT CCAAGGCTCTCTGTGGG CTGACAGAGGCTG CAGCCCAGGCTGCATACT 
TGGTTGGCATCTCTGATCCAAACAGCCAGGCAGGCCACCAGGGCCTGGTGGACCCCAT 
CCAGTTTGCC AGGGCTAACCAGGC CATC CAGATGGCATGCCAGAACTTGGTGG ACC CT 
GGCAGC AGCCCATCACAGGTCCTGTCAG CCGCCACAATTGTTG CCAAGCAC ACGTCAG 
CCTTGTGCAATGCCTGCCGCATCGCCTCATCCAAGACGGCCAACCCAGTAGCCAAGAG 
GCACTT CGTCCAGTC AGCCAAGG AAGT CGCCAACAGCACTGCCAACCTGGTGAAGACC 
ATCAAGGCCCTGGATGGGGATTTCTCTGAAGACAACCGCAATAAGTGTCGCATCGCCA 
CCGCAC CCTTGATTG AAGCTGTGG AG AACCTGACAGCGTT CGCCTCAAACC CTGAGTT 
TGTC AG C ATTCCTGCCCAG AT CAG CTCCGAGGGTTCCCAGGC ACAGGAACC AATCCTG 
GTCTCAGCCAAGACCATGCTGGAGAGTTCATCGTACCTCATTCGCACTGCACGCTCTC 
TGGCCATCAACCCCAAAGACCCACCCACCTGGTCTGTACTGGCTGGACATTCCCATAC 
AGTGTCCGACTCCATCAAGAGTCTCATCACTTCTATCAGGGACAAGGCCCCTGGACAG 
AGGGAGTGTGATTACTCCATCGATGGCATCAACCGGTGCATCCGGGACATCGAGCAGG 
CCTCGCTGGCCGCCGTCAGCCAGAGCCTGGCCACGAGGGACGACATCTCTGTGGAGGC 
CCTGCAGGAGCAGCTGACTTCGGTGGTCCAGGAAATCGGACACCTTATCGATCCCATC 
GCCACAGCGGCTCGGGGAGAAGCAGCTCAGCTGGGACATAAGGTGACACAACTGGCAA 
GCTATTTTGAGCCCTTGATCTTAGCCGCAGTTGGTGTGGCCTCCAAGATTCTTGATCA 
TCAG CAGCAGATGACGGTG CTGGACCAG ACCAAGACTCTCGCAG AGTCTGCCTTGCAG 
ATGTTGTATGCAGCCAAAGAAGGTGGCGGAAACCCCAAGGCACAACACACCCATGACG 
CCATC ACAG AGGCCGCCCAGTTGATGAAGGAAGC CGTGGATG ACATCATGGTG ACG CT 
GAACGAAGCTGCCAGTGAAGTGGGGCTGGTTGGGGGCATGGTGGACGCCATTGCAGAA 
GCCATGAGCAAGCTGGATGAAGGCACTCCTCCAGAACCAAAGGGAACATTTGTCGACT 
ATCAG ACGACTGTGGTTAAAT ACT CCAAAG CCATTGCGGTGACAGCTCAGG AAATGAT 
GACTAAGTCGGTTACTAACCCGGAGGAGTTGGGAGGACTGGCTTCACAAATGACCAGT 
GACT ATGGGCAC CTGG CTT TCCAGGGC CAG ATGG CAGCAGC CACGG CGGAACCAG AGG 
AGATCGGATTCCAGATTCGCACTCGTGTGCAGGACCTGGGCCACGGCTGTATCTTCCT 
GGTG CAGAAGGCAGGGGCCCTCCAGGT CTGC CCCACAG ACAG CTACACC AAGAGGG AG 
CTGATCGAATGCGCCCGTGCCGTCACGGAAAAGGTCTCCTTGGTGCTCTCGGCTCTCC 
AGG C CGGGAACAAAGGAAC CCAGGCATGCATT ACAGCCGCCACCGCTGTGT CTGGG AT 
CATTGC CGAC CTGG ACACC ACC ATT ATGTTTGCAACAGCGGGGACGCTG AATGC AG AG 
AACAGTGAGACCTTCGCAGACCACAGGGAGAACATTCTCAAGACGGCCAAGGCCTTGG 
TAGAAGACACGAAACTACTTGTGTCAGGAGCTGCGTCCACTCCTGACAAGCTGGCCCA 
GGCGGCCCAGTCCTCAGCAGCCACCATCACCCAGCTCGCAGAAGTGGTCAAGCTGGGG 
GCAGCCAGCCTGGGCTCCGACGACCCCGAGACCCAGGTGGATTTGATCAATGCCATCA 
AAGATGTGGCCAAGGCCCTTTCTGATCTCATCAGTGCTACCAAGGGAGCTGCCAGCAA 
GCCAGTGGACGACCCTTCCATGTACCAGCTCAAGGGGGCTGCCAAGGTGATGGTGACC 
AATGT CACCTCG CT CCTCAAGACTGTAAAGG C AGTGG AGGATGAGGCCAC C CGGGG CA 
CCAGGGCGCTTGAGGCCACAATTGAATGCATAAAGCAGGAGCTTACGGTGTTCCAGTC 
AAAAGACGTACCTGAAAAGACATCATCACCTGAAGAATCCATAAGGATGACGAAAGGC 
ATCACCATGGCAACAGCCAAAGCCGTGGCAGCTGGGAACTCATGTAGACAGGAGGACG 
TGATTG CTACTGCC AACCTGAG CCGGAAAG CCGTGTC AGAT ATGTTG ACGGCTTG C AA 
GCAAG CATC CTTCCACCCCG ATGTCAGTGACG AGGTGAG AAC CAGAGCCTTGCGTTTC 
GGGACGGAGTGCACCCTTGGCTACTTGGACCTCCTGGAGCACGTCTTGGTGATTCTTC 
AGAAACC AA C C C C AG AATT CAAGCAGCAG CTGG C CG CTTT CT C C AAG CG AG TCG C CGG 
CGCTGTGACAGAGCTCATCCAGGCGGCGGAAGCCATGAAAGGAACAGAGTGGGTGGAT 
CCAGAAGACCCAACTGTCATTGCAGAAACAGAGTTACTGGGGGCTGCAGCATCCATCG 
AAGCTGCTGCTAAGAAGTTAGAGCAACTGAAGCCAAGAGCAAAACCAAAACAAGCGGA 
TGAGACCCTGGACTTTGAGGAACAGATCTTGGAAGCTGCTAAATCCATTGCTGCTGCC 
ACAAGCGCCCTGGTCAAATCGGCCTCAGCAGCCCAGAGGGAGCTGGTGGCCCAAGGAA 
AGGTGGGCT CCAT CC CTGCC AATGCTG CAGACGACGGACAGTGGTCACAGGGGCTG AT 
TTCTGCTGCCCGGATGGTGGCGGCTGCGACCAGCAGTCTCTGTGAGGCGGCCAATGCC 
TC C G TTC AGGG ACACG C CAG CGAGG AG AAGCT CATCT CATCTG CC AAG CAGGT CGC CG 
CTTCCACGGCTCAGCTGCTGGTGGCCTGCAAGGTGAAGGCCGACCAGGATTCAGAGGC 
CATGAGGCGGCT ACAGGCGG CAGGAAATG CTGTGAAAAGAGCCTCAGACAATCTTGTC 
CGTGCAGCCCAGAAGGCAGCTTTTGGCAAAGCTGATGACGACGATGTTGTAGTGGAAA 
CCAAGTTTGTGGGGGGCATTGCTCAGATCATCGCCGCCCAGGAAGAAATGCTAAAGAA 
AGAGCGAGAACTGGAAGAAGCAAGGAAAAAACTGGCCCAAATCCGCCAGCAGCAGTAT 
AAGTTTTTACCCACCGAGCTGAGGGAAGATGAGGGCTAAAGGTGCGAGCCCAGATGGC 
GAGCCCCAGGGGATGGCCCTGGCTGAA 




ORF Start: ATG at 58 


ORF Stop: TAA at 7693 




SEQ ID NO: 76 


2545 aa 


MWat 271692.8kD 


NOV21a, 


fWAIiSLKICVRHCNVWTWQFEPSTAVYDACRVIRERVPEAQTGQASDYGLFLSDEDP 
RKGIWLEAGRTLDYYMLRNGDILEYKKKQRPQKIRMIjIX3SVKTVMVDDSKTVGE 
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CG57804-01 Protein Sequence 



ICSRIGITNYEEYSLIQETIEEKKEEGTGTLKKDRTLLRDERKMEKLKAKLHTDDDLN 
WLDHSRTFREQGVDENETLLLRRKFFYSDQNVDSRDPVQLNLLYVQARDDILNGSHPV 
SFEKACEFGGFQAQIQFGPHVEHKHKPGFLDLKEFLPKEYIKQRGAEKRIFQEHKNCG 
EMSEIEAKVKYVKLARSLRTYGVSFFLVKEKMKGKNKLVPRLLGITKDSVMRVDEKTK 
EVLQEWPLTTVKRWAASPKSFTLDFGEYQESYYSVQTTEGEQISQLIAGYIDIILKKG 
T YVT S VG S PHCT PHGWC S LS DQTT F PG RST I LQQQFNRTGKAEKGS VAL P AVMRSGS S 
GPETFNVGSMPSPQQQVKVGQMHRGHMPPLTSACJQALMGTINTSMHAVQQAQDDLSEL 
DS LP PLGQDMASRVWVQNKVDE SKHE I HSQVDAI TAGTASVVNLTAGDPADTD YTAVG 
CAITTISSMLTEMSKGVKLLAALMDDEVGSGEDLLRAARTIjAGAVSDLLkAVQPTSGE 
PRQTVLTAAGSIGQASGDLLRQ I GENETDERFQDVLMS LA KA V AN AAAM LVL KAKNV A 
QVAEDTVLQNRVIAAATQCALSTSQLVACAKWSPTISSPVCQEQLIEAGKLVDRSVE 
NCVRACQAATTDS ELLKQVSAAAS WSQALHDLLQHVRQFASRGE PIGRYDQATDTIM 
CVTES I FSSMGDAGEMVRQARVIAQATSDLVNAMRSDAEAE I DMENSKKLLAAAKLLA 
DSTARMVEAAKGAAANPENEDQQQRLREAAEGLRVATNAAAQNAIKKKIVNRLEVAAK 
QAAAAATQT I AASQNAAVSNKN P AAQQQL VQS CKAVADHI PQLVQG VRG S QAQAEDLS 
AQLALI I SSQNFLQPGS KMVS SAKAAVPTVSDQAAAMQLSQCAKNLATSLAELRTASQ 
KAHEACGPMEIDSAI^TVQTLK^ELQDAKMAAVESQLKPLPGETLEKCAQDLGSTSKA 
VG S S MAQ LLTCAAQGN E H YTG VAARE T AQ AL KT LAQ AARG VAAS TT D P AAAHAMLD S A 
RD VMEG S AML I Q EAKQAL I APGD AERQQRLAQ VAKAVS HSLNNCVNCL PGQKD VDVAL 
KS I GES SKKLLVDSLP PSTKP FQEAQS ELNQAAADLNQSAGEWHATRGQSGELAAAS 
GKFS DDFGE FLDAG I EMAGQAQT KEDQ I QV I GNLKN I S MAS S KLLLAAKS LS VD PG AP 
NAKNLLAAAARAVTESINQLITLCTQQAPGQKECDNALRELETVKGMLDNPNEPVSDL 
S YFDCI ESVMENSKVLGE SMAG I SQNAKTGDLPAFGECVGI ASKALCGLTEAAAQAAY 
LVGI SDPNSQAGHQGLVDPIQFARANQAI QMACQNLVD PGS S PSQVLSAATI VAKHTS 
ALCNACRIASSKTANPVAKRHFVQSAKEVANSTANLVKTIKALDGDFSEDNRNKCRIA 
TAPLI EAVENLTAFASNPEFVS I PAQI SSEGSQAQE PI LVS AKTMLESS S YLI RTARS 
LAINPKDPPTWS VLAGHSHTVSDS I KSLI TS I RDKAPGQRECDYS I DG I NRCI RDI EQ 
ASLAAVSQS LATRDDIS VEALQEQLTS WQE IGHLI DP I ATAARGEAAQLGHKVTQLA 
SYFEPLI LAAVGVASKI LDHQQQMTVLDQTKTLAESALQMLYAAKEGGGNPKAQHTHD 
AITEAAQLMKEAVDDIMVTLNEAAS EVGLVGGMVDAI AEAMSKLDEGT PPE PKGTFVD 
YQTTWKYSKAIAVTAQEhlhrrKSVTNPEEIXSGIaASQMTSDYGHIiAFQGQMAAATAEPE 
EIGFQIRTRVQDLGHGCIFLVQKAGALQVCPTDSYTKRELIECARAVTEKVSLVLSAL 
QAGNKGTQAC ITAATAVSG 1 1 ADLDTTIMFATAGTLNAENS ETFADHRENI LKTAKAL 
VEDTKLLVSGAASTPDKLAQAAQSSAATITQLAEVVKLGAASLGSDDPETQVDLINAI 
KDVAKALSDLISATKGAASKPVDDPSMYQLKGAAKVMVTNVTSLLKTVKAVEDEATRG 
TRALEATIECIKQELTVFQSKDVPEKTSSPEESIRMTKGITMATAKAVAAGNSCRQED 
VI ATANLS RKAVSDMLT ACKQ AS FH PDVS D E VRTRALRFGT ECTLG YLDLLEHVL V I L 
QKPTPEFKQQLAAFSKRVAGAVTELIQAAEAMKGTEWVDPEDPTVIAETELLGAAASI 
EAAAKKLEQLKPRAKPKQADETLDFEEQI LEAAKS I AAATSALVKSASAAQRELVAQG 
KVGS I PANAADDGQWSQG L I SAARMVAAAT S S LC EAANAS VQGHAS EE KL I SS AKQ VA 
ASTAQLLVACKVKADQDSEAMRRLQAAGNAVKRASDNLVRAAQKAAFGKADDDDVVVE 
TKFVGGIAQIIAAQEEMLKKERELEEARKKLAQIRQQQYKFLPTELREDEG 



Further analysis of the NOV21a protein yielded the following properties shown in 
Table 2 IB. 



Table 21B. Protein Sequence Properties NOV21a 


PSort 
analysis: 


0.5964 probability located in mitochondrial matrix space; 0.3037 probability 
located in mitochondrial inner membrane; 0.3037 probability located in 
mitochondrial intermembrane space; 0.3037 probability located in mitochondrial 
outer membrane 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV21a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 21C. 
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Table 21C. Geneseq Results for NOV21a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV21a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB41087 


Human ORFX ORF851 polypeptide 
sequence SEQ ID NO: 1702 - Homo 
sapiens, 2541 aa. [WO200058473- 
A2, 05-OCT-2000] 


1..2543 
1..2540 


1913/2546(75%) | 
2231/2546(87%) j 


0.0 


AAM39312 


Human polypeptide SEQ ID NO 
2457 - Homo sapiens, 1 165 aa. 
[WO200153312-A1, 26-JUL-2001] 


1381..2545 
1..1165 


1161/1165(99%) 
1163/1165 (99%) 


0.0 


AAM79794 


Human protein SEQ ID NO 3440 - 
Homo sapiens, 1177 aa. 
[WO200157190-A2, 09-AUG- 
2001] 


1378..2545 
10..1177 


1156/1168(98%) ! 
1160/1168(98%) 


0.0 


AAM41098 


Human polypeptide SEQ ID NO 
6029 - Homo sapiens, 1 177 aa. 
[WO200153312-A1, 26-JUL-2001] 


1378..2545 
10..1177 


1156/1168(98%) 
1160/1168(98%) i 


0.0 


AAM41079 


Human polypeptide SEQ ID NO 
6010 - Homo sapiens, 1 177 aa. 
[WO200153312-A1, 26-JUL-2001] 


1378..2545 
10..1177 


1156/1168(98%) 
1160/1168(98%) 


0.0 



In a BLAST search of public sequence databases, the NOV21a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 2 ID. 



Table 21D. Public BLASTP Results for NOV21a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV21a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the , 
Matched Portion 


Expect 
Value 


Q9Y490 


Talin - Homo sapiens (Human), 
2541 aa. 


1..2543 
1..2540 


1910/2546(75%) i 
2230/2546 (87%) 


0.0 


P26039 


Talin - Mus musculus (Mouse), 
2541 aa. 


1..2543 
1..2540 


1907/2546(74%) : 
2230/2546(86%) 


0.0 


Q9UPX3 


KIAA1 027 PROTEIN - Homo 
sapiens (Human), 1695 aa 
(fragment). 


853..2543 
1..1694 


1262/1694 (74%) 
1483/1694 (87%) 


0.0 


Q9VSL8 


CG683 1 PROTEIN (TALIN) - 


1..2532 
1..2534 


1197/2563(46%) 
1707/2563 (65%) 


0.0 
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fly), 2836 aa. 








Q9Y4G6 


KIAA0320 PROTEIN - Homo 
sapiens (Human), 949 aa 
(fragment). 


1597..2545 
1..949 


947/949 (99%) 
948/949 (99%) ; 


0.0 



PFam analysis predicts that the NOV21a protein contains the domains shown in the 
Table 2 IE. 



Table 21E. Domain Analysis of NO V2 la 


Pfam Domain 


NOV21a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


ubiquitin: domain 1 of 1 


64..88 


8/27 (30%) 
20/27 (74%) 


4.3 


Band_41: domain 1 of 1 


123..316 


67/211 (32%) 
172/211 (82%) 


1.3e-92 


IRS: domain 1 of 1 


312..404 


19/109(17%) 
46/109 (42%) 


1.2 


I_LWEQ: domain 1 of 5 


674..768 


31/98 (32%) 
59/98 (60%) 


11 


transport_prot: domain 1 
ofl 


667..814 


24/182(13%) 
88/182 (48%) 


10 


I_LWEQ:domain2of5 


852..894 


18/47 (38%) 
31/47 (66%) 


2.4e+02 


Vinculin: domain 1 of 1 


860..903 


12/48 (25%) 
30/48 (62%) 


1.3 


I_LWEQ:domain3of5 


. 92S..984 


21/62 (34%) 
37/62 (60%) 


5.9e+04 


TP methylase: domain 1 
ofl 


861..1036 


26/226 (12%) 
105/226(46%) 


8 


Apolipoprotein: domain 
1 ofl 


981. .1229 


48/288(17%) 
141/288 (49%) 


3.5 


CAP: domain 1 ofl 


917..1354 


94/557(17%) 
209/557 (38%) 


4.4 


I_LWEQ: domain 4 of 5 


1529.. 1545 


10/17 (59%) 
13/17(76%) 


56 


STAT: domain 1 of 1 


1660.. 1821 


35/211 (17%) 
95/211 (45%) 


8.2 
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LEA: domain 1 of 1 


1768.. 1834 


15/76 (20%) 
42/76 (55%) 


7 


Histone HNS: domain 1 
ofl 


2232..2356 


29/143 (20%) 
63/143 (44%) 


3.7 


I_LWEQ:domain5of5 


2345.. 2536 


100/202(50%) 
183/202(91%) 


2e-101 



Example 22. 

The NOV22 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 22A. 



Table 22A. NOV22 Sequence Analysis 




SEQ ID NO: 77 


2214 bp 


NOV22a, 

CG57551-01 DNA Sequence 


ATTPPTPPPTGPPPPTPGTnPAfSPPGPTYiPPATfirt rrr arc a ra pTrrart aTvir* a p rtp 


iu^lU^.\>nuv^vjL 1 1 r\\* 1 i LV>uuun^l# lux 1 \-J\n\~t\n\**\\j\* L>uUiuv>u LL.v? 1 X Vj 

f^AfjpTnppfjfZPf^n CTnTnrar:pr'OPflr;TrTTTrrLr!rQr:aTrr , Trzir , PTTPTrr , Tnr , B 

CGGGCCGGCTG AC C ATG A APGTnGGOG A CC A GTT PPTGPT P ATGT A r A Pf^rTP P PTT 
LLl vjL-KLjM i \~H\3\jt\\ji\ X Ui 1 ouAuAAuouLnLLuAu 1 1L i 1 LL X LAAlju IbAbL ILL 

CCGAGCTGCGACTCCCAGGGCCTGCATGCGGAGGAGGCCCCATCGTCGGAGCCCCAGA 
GCCCCGTGGCGCAGACATCGGGCTGGCCAGCCTGTAGCACCCCGCTGCCCCTCGTGTC 
GCGGGTG AAG ACGGAGC AG CAGGAGT CGGACTCCGTGCAGTGCATGCCCGTGG CC AAG 
CGGCTGTGGGAC AGTGG C CAGAAGGAGG CTGGGGGCGG CGGCAATGGCAG CCG C AAG A 
TGGCCAAGTTCTCCACXSCCGGACCTGGCTC 

CCAACAGGCTCCGGTGGTGGCAGCAGCCCAGCCCGCCGTGGCTGCGGGAGCAGGGCAG 
CCAGCCGGTGGGGTGGCAGCAGCAGGGGGTGTGGTGAGTGGGCCCAGCACGTCGGAGC 
GGACCAGCCCAGGCACCTCAAGCGCCTACACCAGCGACAGCCCTGGCTCCTACCACAA 
TGAGGAGGACGAGG AGGAGGATGGTGG CG AGGAGGGCATGG ATGAG CAGTAC CGGCAG 
ATCTG CAACATGTACACCATGT ACAGC ATG ATG AACGT CGG CC AG ACAG CCGAGAAGG 
TGGAGGCCCTCCCGGAGCAGGTAGCCCCCGAGTCCCGAAATCGCATCCGGGTTCGGCA 
AGACCTGGCGTCTCTCCCGGCTGAACTTATC^CC^GATTGGGAACCGCTGCCACCCC 
AAGCTCTACGACGAGGGCGACCCCTCTGAGAAGCTGGAGCTGGTGACAGGCACCAACG 
TGTACAT CACAAGGGCGCAGC TGATGAACTGCCACGTCAG CGCAGG CACG CGGCACAA 
GGTCCTACTGCGGCGGCTCCTGGCCTCCTTCTTTGACCGGAACACGCTGGCCAACAGC 
TGCGGCACCGGCATCCGCTCTTCTACCAACGATCCCCGTCGGAAGCCCCTGGACAGCC 
GCGTGCTCCACGCTGTCAAGTACTACTGCC^GAACT^CGCCCCCAACTTCAAGGAGAG 
CGAGATGAATGCCATCGCGGCCGACATGTGCACCAACGCCCGCCGCGTCGTGCGCAAG 
AG CTGG ATG C CC AAGG TCAAGGTG CT CAAGG CTG AGG ATGACG C CT ACA C C AC CTTC A 
TCAGTGAAACGGGC^AGATCGAGCCGGACATGATGGGTGTGGAGCATGGCTTCGAGAC 
CGCCAGCCACGAGGGCGAGGCGGGTCCCATCGCTGAAGCCCTGCAGTAACCCGCCCAG 
CCTCCCGCGGGGCCGC^CACTTCCCCTCCCAAC^ 


C7VTGAGCTACTCTCTGTCCCTCCCCAGGACCCGTC 


CTCTGCCCCTCCTGTCCTACCCCCrTTCCCCACaSAGAGCTGGGCCGGGAGAGGACCG 


CAGGGCAGGTGGCGTGAGGTCOTTCTTGCCTTCT^TAACACACACTC 


GAGTTCTGGCTCCCCAACCTAACCCCTAGCCGTCATCTCCAC^CTCACCAGGCCC^ 


AGGGGAGGGGGCTGGCCT<KMGGTCT^GGGAAGGCCCCTCCCCAGGCCTTAGGCCACC 


TCGCGGAAGCCTTC2U3CCTCCGCCCCTC7VCTG 


CCAGGGGTTCTCAGGACCCCTCCCACCACCTCCCAGTGCTTCCACGTCTCCAAAAG 


CCTTCCTGTCACCCrC^TCTATCCCTGCGCCTCrGGGGCTGGGGTAGGCGAGGCCGTGG 


GGACTACCCATT^TATAGCTGGGGAAACAGGCTCCGAGAAATTGC^CAACCGACCTCA 


GGTGGCCGGC 




ORF Start: ATG at 32 


ORF Stop: TAA at 1613 




SEQ ID NO: 78 


527 aa MW at 57283.8kD 


NOV22a, 

CG57551-01 Protein Sequence 


MAQTLQME I PNFGNS I LECLNEQRLQGLYCDVS VWKGHAFKAHRAVLAAS SS YFRDL 
FNNSRSAWELPAAVQPQSFQQILSFCYTGRLSMNVGDQFLLT1YTAGFLQI QE IMEKG 
TEFFLKVSSPSCDSQGLHAEEAPSSEPQSPVAQTSGWPACSTPLPLVSRVKTEQQESD 
SVQCMPVAJCRLWDSGQKEAGGGGNGSRKMAKFSTPDLAANRPHQPPPPQQAPWAAAQ 
PAVAAGAGQPAGG VAAAGGWSGPSTSERTS PGTSS AYTSDS PGS YHNEEDE EEDGGE 
EGMD EQYRQ I CNMYTMYSMMNVGQT AE KVEAL P EQ VAPE SRNR I R VRQDLAS L P AEL I 
NQIGNRCHPKLYDEGDPSEKLELVTGTNVYITPAQLMNCHVSAGTRHKVLLRRLLASF 
FDRNT LANSOGTG IRS S TND PRRKPLDS R VLHAVKYYCQNFAPNFKE S EMNA I AADMC 
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TNARRWRKSWMPFCVKVLKAEDDAYTTFISETGKIEPDMMGVEHGFBTASHEGBAGPI 




AEALQ 



Further analysis of the NOV22a protein yielded the following properties shown in 
Table 22B. 



Table 22B. Protein Sequence Properties NOV22a 


PSort analysis: 


0.6000 probability located in nucleus; 0.1000 probability located in 
mitochondrial matrix space; 0.1000 probability located in lysosome (lumen); 
0.0000 probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV22a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 22C. 



Table 22C. Geneseq Results for NOV22a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV22a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB41621 


Human ORFX ORF1385 polypeptide 
sequence SEQ ID NO:2770 - Homo 
sapiens, 228 aa. [WO200058473-A2, 
05-OCT-2000] 


300..527 
1..228 


228/228 (100%) 
228/228 (100%) 


e-131 


ABB17117 


Human nervous system related 
polypeptide SEQ ID NO 5774 - 
Homo sapiens, 190 aa. 
[WO200159063-A2, 16-AUG-2001] 


409..501 
1..93 


64/94 (68%) 
73/94 (77%) 


7e-29 


AAG78615 


Human zinc finger transcription 
factor BioZFTF45 - Homo sapiens, 
413 aa. [CN1299825-A, 20-JUN- 
2001] 


5.. 159 
7.. 170 


62/164 (37%) 
92/164 (55%) 


2e-25 


AAY73351 


HTRM clone 1484257 protein 
sequence - Homo sapiens, 810 aa. 
[W09957144-A2, ll-NOV-1999] 


7..291 
1..277 


83/291 (28%) 
124/291 (42%) 


8e-18 


AAM41058 


Human polypeptide SEQ ID NO 
5989 - Homo sapiens, 804 aa. 
[WO200153312-A1, 26-JUL-2001] 


7..291 
2..271 


84/295 (28%) 
123/295 (41%) 


2e-17 
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In a BLAST search of public sequence databases, the NOV22a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 22D. 



Table 22D. Public BLASTP Results for NOV22a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV22a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96RE7 


NAC1 PROTEIN - Homo sapiens 
(Human), 527 aa. 


1..527 
1..527 


526/527 (99%) 
526/527 (99%) 


0.0 


035260 


NAC-1 PROTEIN - Rattus 
norvegicus (Rat), 514 aa. 


1..527 
1..514 


462/530 (87%) 
475/530 (89%) ; 


0.0 


Q9CZ72 


493051 1N13RDC PROTEIN - Mus 
musculus (Mouse), 514 aa. 


1..527 
1..514 


462/530 (87%) 
476/530 (89%) 


0.0 


Q96BF6 


SIMILAR TO RIKEN CDNA 
0610020102 GENE - Homo sapiens 
(Human), 587 aa. 


1..501 
1..478 


289/522 (55%) 
335/522(63%) 


e-140 


AAH22103 


RIKEN CDNA 0610020102 GENE 
- Mus musculus (Mouse), 586 aa. 


1..485 
1..459 


281/502(55%) 
327/502(64%) 


e-139 



PFam analysis predicts that the NOV22a protein contains the domains shown in the 
Table 22E. 



Table 22E. Domain Analysis of NOV 22a 


Pfam Domain 


NOV22a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


BTB: domain 1 of 1 


14..124 


40/143 (28%) 
88/143 (62%) 


6.2e-23 



Example 23. 

The NOV23 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 23 A. 



Table 23A. NOV23 Sequence Analysis 




SEQ ID NO: 79 1497 bp 


NOV23a, 

CG57411-01 DNA Sequence 


ATGGCCACTGCACAGGTGG AACTGGTG CAGGGTGGTCCCCGGG CTCCAGTAGGGGAG A 
AGCTGGAGCTCGTCCTCTCGAACCTGC^GGCAGACGTCCTGGAGTTGCTGCTGGAGTT 
TGTCTACACGGGCTCCCTGGTCATCGACTCGGCCAACGCCAAGACACTGCTGGAGGCG 
GCCAGCAAGTTCCAGTTCCACACCTTCTGCAAAGT 

AGCTGACGGCCAGCAACTGCCTGGGCGTGCTGGCCATGGCCGAGGCCATGCAGTGCAG 
CGAGCTCTACCACATX^CCAAGGCCTTCGCGCTGCAGATCTTCCCCGAGGTGGCCGCC 
CAGGAGGAGATCCTCAGCATCTCCAAGGACGACTTCATCGCCTACGTCTCCAACGACA 
GCCTCAACACCAAGGCTGAGGAGCTGGTGTACGAGACAGTCATCAAGTGGATCAAGAA 
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GGACCCCGCGACACGCACACAGCTGCAGTACGCGGCTGAGCTCCTGGCCGTGGTCCGC 
CTCCCCTTCATCCACCCCAGCTACCTGCTCAATGTGGTTGACAATGAAGAGCTGATCA 
AGTCATCAGAAGCCTGCCGGGACCTGGTGAACGAGGCCAAACGCTACCATATGCTGCC 
CCACGCCCGCCAGGAGATGCAGACGCCCCGAACCCGGCCGCGCGTCCCTGCAGGTGTG 
GCTGAGGTCATCGTCTTGGTTGGGGGCCGTCAGATGGTGGGGATGACCCAGCGCTCGC 
TGGTGGCCGTCACCTGCTGGAACCCGCAGAACAACAAGTGGTACCCCTTGGCCTCGCT 
GCCCTTCTATGACCGCGAGTTCTTCAGTGTAGTGAGTGCAGGGGACAACATCTACCTC 
TCAGGTGGGATGGAATCAGGGGTGACGCTGGCTGATGTCTGGTGCTACATGTCCCTGC 
TTGATAACTGGAACCTCGTCTCCAGAATGACAGTCCCCCGCTGTCGGCACAATAGCCT 
CGT CT ACG ATGGGAAGATTT ACACCCTCGGGGG ACTTGGCGTGG CAGG CAACGTGG AC 
CACGTGGAGGTCCCTGCAGGTGTGGCTGAGGTCATCGTCTTGGTTGGGGGCCGTCAGA 
TGGTGGGGATGACCCAGCGCTCGCTGGTGGCCGTCACCTGCTGGAACCCGCAGAACAA 
CAAGTGGTACCCCTTGGCCTCGCTGGGTGGGATGGAATCAGGGGTGACGCTGGCTGAT 
GTCTGGTGCTACATGTCCCTGCTTGATAACTGGAACCTCGTCTCCAGAATGACAGTCC 
CCCG CTGTCGGCACAATAG CCTCGTCTACGATGGGAAGATTTACAC CCTCGGGGGACT 
TGG CGTGGCAGG CAACGTGGACCACGTGG AGGCCTACGAGCCCACAACCAACACATGG 
ACCCTCCTCCCCCACATGCCCTGCCCTGTGTTCAGACACGGCTGCGTCGTGATAAAGA 
AATATATTCAAAGCGGCTGACATCAGCAGAAAGCCCACGATAAGACT 




ORF Start: ATG at 1 


ORF Stop: TGA at 1468 




SEQ ID NO: 80 


489 aa 


MW at 54208.2kD 


NOV23a, 

CG57411-01 Protein Sequence 


MATAQVELVQGGPRAPVGEKLELVLSNLQADVLELLLEFVYTGSLVIDSANAKTLLEA 

ASKFQFHTFCKVCVS FLEKQLTASNCLGVLAMAEAMQCS ELYHMAKAFALQI FPEVAA 

QEEI LS I SKDDFI AYVSNDSLNTKAEELVYETVI KWIKKDPATRTQLQYAAELLAVVR 

LP F I H P S YLLNWDNEE LI KS S EACRDLVNEAKR YHML PHARQEMQT P RTR PR VP AG V 

AEVIVLVGGRQMVGMTQRSLVAVTCWNPQNNKWYPLASLPFYDREFFSWSAGDNIYL 

SGGMESGVTLADWCYMSLLDNWNLVSRMTVPRCRHNSLVYDGKIYTLGGLGV 

HVE VP AG VAE V I VLVGGRQMVGMTQRS L VAVTCWN PQNNKWY P LAS LGGMESGVT LAD 

WCYMSLUDNWNLVSRKI^PRCRHNSLVYDGKIYTLGGI^ 

TLLPHMPCPVFRHGCWIKKYIQSG 



Further analysis of the NOV23a protein yielded the following properties shown in 
Table 23B. 



Table 23B. Protein Sequence Properties NOV23a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.2271 probability located in lysosome 
(lumen); 0. 1 000 probability located in mitochondrial matrix space; 0.0000 
probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV23a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 23C. 
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Table 23C. Geneseq Results for NOV23a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV23a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB40940 


Human ORFX ORF704 polypeptide 
sequence SEQ ID NO: 1408 - Homo 
sapiens, 335 aa. [WO200058473-A2, 
05-OCT-2000] 


19..351 
4..334 


317/333(95%) 
320/333 (95%) 


e-180 


AAM38711 


Human polypeptide SEQ ID NO 1856 
- Homo sapiens, 574 aa. 
[WO200153312-A1, 26-JUL-2001] 


22..472 
78..559 


151/488 (30%) 
222/488 (44%) 


2e-61 


AAB43090 


Human ORFX ORF2854 polypeptide 
sequence SEQ ID NO:5708 - Homo 
sapiens, 506 aa. [WO200058473-A2, 
05-OCT-2000] 


22..468 I 
9..487 j 


150/491 (30%) 
241/491 (48%) 


3e-59 


AAM38956 


Human polypeptide SEQ ID NO 2101 
- Homo sapiens, 587 aa. 
[WO200153312-A1, 26-JUL-2001] 


22..468 
90..568 


149/491 (30%) 
240/491 (48%) 


le-58 


AAM94018 


Human stomach cancer expressed 
polypeptide SEQ ID NO 106 - Homo 
sapiens, 568 aa. [WO200109317-A1, 
08-FEB-2001] 


25..470 ! 
76..553 | 


148/490 (30%) 
231/490(46%) 


3e-56 



In a BLAST search of public sequence databases, the NOV23a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 23D. 



Table 23D. Public BLASTP Results for NOV23a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV23a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96CT2 


HYPOTHETICAL 76.8 KDA 
PROTEIN - Homo sapiens 
(Human), 707 aa (fragment). 


19..489 
203..707 


390/507 (76%) 
406/507 (79%) 


0.0 


Q96PW7 


KIAA1921 PROTEIN - Homo 
sapiens (Human), 545 aa (fragment). 


19..489 
41. .545 


390/507 (76%) 
406/507 (79%) 


0.0 


Q96BF0 


SIMILAR TO HYPOTHETICAL 
PROTEIN FLJ14106 - Homo 
sapiens (Human), 503 aa. 


19..351 
172..502 


329/333 (98%) 
330/333 (98%) 


0.0 
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Q9D5K3 


4930429H24RIK PROTEIN - Mus 
musculus (Mouse), 484 aa. 


33..485 
1..477 


165/492(33%) 
248/492 (49%) 


2e-66 


Q9UH77 


Kelch-like protein 3 - Homo sapiens 
(Human), 587 aa. 


22..468 
90..568 


150/491 (30%) ! 
241/491(48%) 


le-58 



PFam analysis predicts that the NOV23a protein contains the domains shown in the 
Table 23E. 



Table 23E. Domain Analysis of NOV23a 


Pfam Domain 


NOV23a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


BTB: domain 1 of 1 j 


4..79 


24/143 (17%) 
53/143 (37%) 


3.7 


Kelch: domain 1 of 4 


223.-272 


9/50(18%) 
28/50 (56%) 


0.94 


Kelch: domain 2 of 4 : 


275..320 


11/47(23%) 
27/47 (57%) 


0.016 


Kelch: domain 3 of 4 


322..396 


14/75 (19%) 
44/75 (59%) 


3.3e-05 


Kelch: domain 4 of 4 


426..471 


19/47 (40%) 
35/47 (74%) 


7.2e-10 



Example 24. 

The NOV24 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 24A. 



Table 24A. NOV24 Sequence Analysis 




SEQ IDNO:81 4268 bp 


NOV24a, 

CG57399-01 DNA Sequence 


ATOACCTGGGACACAGCTCTCTGGACCTCAGTTTTTCTGATTGGGCTCCTTCCTACCC 
TTGGTTTCGCTAATTGCATCCTCCAGACTTCTGGTAAAATGTGTACTTTAAGAGGTAG 
ATACCCCCAGCCCCCACAACCACCTCTCTGCTTGTCTCCCCTAGTCCACCAGCTCCGA 
CCAGCAGACATCAAAGTGGTGGCCGCCCTGGGTAATGATGAAACCTTCCAGGAAAGTG 
GTGCAGGGCAGCTAAGTGAGCCTGACCCCAGGCAGTGGTCCTGGCCACAGGCCTGCTT 
GCCTGGGGTAAAAAAGGAAATGCAAGATGTGGTAGGTGAGAGAACGCCGAGCCGTCGC 
CGCAGCCTCCGCCGCCGAGAAGCCCTTGTTCCCGCTGCTGGGAAGGAGAGTCTGTGCC 
GACAAGATATTTTCATTTCCTTGTTGGAAATTATCAAGCATTTTCCTCCCTCCCCTCA 
GGACATCAACCTCGAGAAAGACTGGAAGCTGGTCACACTCTTCATTGGGGTCAACGAC 
TTGTGTCATTACTGTCCACTTGTTCAGGG CCC CGTTAT AG ACCTGGGTGGG ATGGATA 
CCCTCCACTCCCTGCAGCTCCCAAGGGCTTTCGTCAACGTGGTGGAGGTCATGGAGCT 
GGCTAGCCTGTACCAGGGCCAAGGCGGGAAATGTGCCATGCTGGCAGCTCAGGAAGCC 
TGGAACAGCCTCCTGGCCTCCAGCAGGTACAGTGAGCAGGAGTCCTTCACCGTGGTTT 
TCCAGCCTTTCTTCTATGAGACCACCCCATCTGACCCCCGACTCCAGGATTCTACCAC 
GCTGG CCTGGCATCTCTGGAATAGG ATGATGG AG C CAGCAGGAGAGAAAG ATGAGCCA 
TTGAGTGTAAAACACGGGAGGCCAATGAAGTGTCCCTCTCAGGAGAGCCCCTATCTGT 
TCAGCTACAGAAACAGCAACTACCTX3ACCAGACTGCAGAAACCCCAAGACAAGCTTGT 
AAGAGAAGGAGCGGAAATCAGATGTCCTGACAAAGACCCCTCCGATACGGTTCCCACC 
TCAGTTCATAGGCTGAAGCCGGCTGACATCAACGTAATTGGAGCCCTGGGTGACTCTC 
TCACGGCAGGCAATGGGGCCGGGTCCACACCTGGGAACGTCTTGGACGTCTTGACTCA 
GTACCG AGGCCTGTCCTGGAGOGTCGG CGGAGATGAG AACATCGG CACCGTTACCACC 
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CTGGCAGACATCCTCCGGGAATTCAACCCTTCCCTGAAGGGCTTCTCTGTTGGCACTG 
GGAAAGAAACCAGTCCTAATGCCTTCTTAAACCAGGCTGTGGCAGGAGGCCX3AGCTGA 
GCAGGCCAGGAGGCTGGTGGACCTGATGAAGAATGACACGAGGATACACTTTCAGGAA 
GACTGGAAGATAATAACCCTGTTTATAGGCGGCAATGACCTCTGTGATTTCTGCAATG 
ATCTGGTACACTATTCTCCCCAGAACTTCACAGACAACATTGGAAAGGCCCTGGACAT 
CCTCCATGCTGAGTCTCAGGTTCCTCGGGCATTTGTGAACCTGGTGACGGTGCTTGAG 
ATCGTCAACCTG AGGGAGCTGTACCAGGAG AAAAAAGTCTACTG CC CAAGG ATG ATCC 
T CAGGTCACTGTGT CCCTGTGT C CTG AAGTTTG ATGATAACTCAACAG AACTTG CTAC 
CCTCATCGAATTCAACAAGAAGTTTCAGGAGAAGACCCACCAACTGATTGAGAGTGGG 
CGATATGACACAAGGGAAGATTTTACTGTGGTTGTGCAGCCGTTCTTTGAAAACGTGG 

acatgcovaagacccaggaaggattgcctgacmctctttcttcgctcctgactgttt 

ccacttcagcagcaagtctcactcccgagcagccagtgctctctggaacaatatgctg 

gagcctgttggccagaagacgactcgtcataagtttgaaaacaagatcaatatcacat 

gtccgtcacaggtccag ccgtttctgaggacctac aag aac ag catgcagggtcatgg 

gacctggctgccatgcagggacagagccccttctgccttgcaccctacctcagtgcat 

gccctgagacctgcagacatccaagitgtggctgctctgggggattctctgaccgctg 

gcaatggaattggctccaaaccagacgacctccccgatgtcaccacacagtatcgggg' 

actgtcatacagtgcaggaggggacggctccctggagaatgtgaccaccttacctagt 

tctatccttcgggagtttaacagaaacctcacaggctacgccgtgggcacgggtgatg 

ccaatg acacg aatg c attcctcaatcaagctgttcccgg ag caaaggctagggatct 

tatgagccaagt c caaactctgatgcagaagatgaaagatgatc atag agtaaatttc 

catcaagactggaaggtcatcacagtgctgatcggaggcagcgatttatgtgactact 

GCACAGATTCGAATCTGTATTCTGCAGCCAACTTTGTTCACCATCTCCGCAATGCCTT 
GGACGTCCTGCATAGAGAGGTGCCCAGAGTCCTGGTCAACCTCGTGGACTTCCTGAAC 
CCCACTATCATGCGGCAGGTGTTCCTGGGAAACCCAGACAAGTGCCCAGTGCAGCAGG 
CCAGCGTTTTGTGTAACTGCGTTCTGACCCTGCGGGAGAACTCCCAAGAGCTAGCCAG 
GCTGGAGGCCTTCAGCCGAGCCTACCAGAGCAGCATGCGCGAGCTGGTGGGGTCAGGC 
CGCTATGACACGCAGGAGGACTTCTCTCTGGTGCTGCAGCCCTTCTTCCAGAACATCC 
AGCTCCCTGTCCTGCAGGATGGGCTCCCAGATACGTCCTTCTTTGCCCCAGACTGCAT 
CCACCCAAATCAGAAATTCCACTCCCAGCTGGCCAGAGCCCTTTGGACCAATATGCTT 
GAACCACTTGGAAG CAAAACAGAGACCCTGGACCTGAG AGCAGAGATGCC CATCACCT 
GTCCCACTCAG AATGAG CCCTTCCTGAG AACCCCTCGG AATAGT AACTACACGT AC CC 
CATCAAG CCAGCCATTG AG AACTGGGGCAGTG ACTTCCTGTGT ACAG AGTGGAAGG CT 
TCCAATAGTGTTCCAACCTCTGTCCACCAGCTCCGACCAGCAGACATCAAAGTGGTGG 
CCGCCCTGGGTGACTCTCTGACTACAGCAGTGGGAGCTCGACCAAACAACTCCAGTGA 
CCTACCCACATCTTGGAGGGGACTCTCTTGGAGCATTGGAGGGGATGGGAACTTGGAG 
ACTCACACCACACTGCCCAGTATTCTGAAGAAGTTCAACCCTTACCTCCTTGGCTTCT 
CTACCAGCAC CTGGG AGGGG ACAGCAGGACTAAATGTGGCAG CGGAAGGGGCCAG AG C 
T AGGAGGG ACATGCCAGCCCAGGCCTGGG ACCTGGTAGAG CG AATGAAAAACAGCCC C 
ATACACTTTCAGGAAGACTGGAAGATAATAACCCTGTTTATAGGCGGCAATGACCTCT 
GTGATTTCTGCAATGATCTGGTAGGTGAATATGTTCAGCACATCCAACAGGCCCTGGA 
CATCCTCTCTGAGGAGCTCCCAAGGGCTTTCGTCAACGTGGTGGAGGTCATGGAGCTG 
GCTAGCCTGTACCAGGGCCAAGGCGGGAAATGTGCCATGCTGGCAGCTCAGAACAACT 
GCACTTGCCT C AG ACACT CG CAAAG CT C C CTGGAG AAG CAAG AACTG AAG AAAGTGAA 
CTGGAACCTCCAGCATGGCATCTCCAGTTTCTCCTACTGGCACCAATACACACAGCGT 
GAGGACTTTGCGGTTGTGGTGCAGCCTTTCTTCCAAAACACACTCACCCCACrrGAACA 
GAGGGGACACTGACCTC^CCTTCTTCTCCGAGGACTGTTTTCACTTCTCAGACCGCGG 
GCATGCCGAGATGGCCATCGCACTCTGGAACAACATGCTGGAACCAGTGGGCCGCAAG 
ACTACCTCCAACAACTTCACCCACAGCCGAGCCAAACTCAAGTGCCCCTCTCCTGTGA 
GTCCTTACCTCTACACCCTGCGGAACAGCCGATTGCTCCCAGACCAGGCTGAAGAAGC 
CCCCGAGGTGCTCTACTGGGCTGTCCCAGTGGCAGCGGGAGTCGGCCTTGTGGTGGGC 
ATCATCGGGACAGTGGTCTGGAGGTGCAGGAGAGGTGGCCGGAGGGAAGATCCTCCAA 
TGAGCCTGCGCACTGTGGCCCTCTAGGCCCGGGG 




ORF Start: ATG at 1 


ORF Stop: TAG at 4258 




SEQ ID NO: 82 


1419 aa 


MWat 1 58435. lkD 


NOV24a, 

CG57399-01 Protein Sequence 


MTWDTALWTS VFL IGLLPTLGFANCI LQTSGKMCTLRGRY PQ P PQPPLCLS PLVHQLR 

PADIKWAALGNDETFQESGAGQLSEPDPRQWSWPQACLPGVKKEMQDWGERTPSRR 

RSLRRREALV P AAG KES LCRQD I F I S LLE 1 1 KHFP PS PQD I NL EKDWKL VTLF I G VND 

LCH YC P LVQG P V I DLGGMDTLHS LQLPRAFVNWE VME LAS L YQGQGG KCAMLAAQEA 

WNSLLASSRYSEQESFTWFQPFFTETTPSDPRIiQDSTTLAWHLWNRMMEPAGEKDEP 

LSVKHGRPMKCPSQESPYLFSYRNSNYLTRLQKPQDKLVREGAEIRCPDKDPSDTVPT 

SVHRLKPADINVIGALGDSLTAGNGAGSTPGNVLDVLTQYRGLSWSVGGDENIGTVTT 

LADI LREFNPS LKGFS VGTGKETS PNAFLNQAVAGGRAEQARRLVDLMKNDTRIHFQE 

DWKI I TLFIGGNDLCDFCNDLVHYS PQNFTDNI GKALD I LHAES QVPRAFVNLVTVLE 

I VNLRELYQEKKVYC PRMI LRSLCPCVLKFDDNSTELATLI EFNKKFQEKTHQLI ESG 

RYDTREDFTVWQPFFENVDMPKTQEGLPDNSFFAPDCFHFSSKSHSRAASALWNNML 

EPVGQKTTRHKFENKINITCPSQVQPFLRTYKNSMQGHGTWLPCRDRAPSALHPTSVH 

ALRPADI QWAALGDSLTAGNG IGS KPDDLPDVTTQYRGLS YS AGGDGS LENVTTLPS 

S I LRE FNRNLTGYAVGTGDANDTNAFLNQAVPGAKARDLMSQVQTLMQKWKDDHRVNF 

HEDWKV I TVL IGG SDLCD YCTDS NL YS AANFVHHLRNAIiDVLHRE VPR VL VNLVD F LN 

PTI MRQVFLGNPDKC PVQQAS VLCNCVLTLRENSQELARLEAFS RAYQSSMRELVGSG 

RYDTQEnDFSWLQPFFQNIQLPVLQIXSLPDTSFFAPDCIHPNQKra 

E PLGS KTETLDLRAEMPI TCPTQNE PFLRTPRNSNYTYPI KPAI ENWGSDFLCTEWKA 

SNS VPTS VHQLRPAD I KWAALGDS LTTAVGARPNNSSDLPTSWRGLSWS IGGDGNLE 
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THTTLPS I LKKFNPYLIjGFSTSTWEGTAGLNVAAEGARARRDMPAQAWDLVERMKNS P 
IHFQEDWKI ITLFIGGNDLCDFCNDLVGEYVQHIQQALDI LSEELPRAFVNWEVMEL 
AS L YQGQGGKCAMLAAQNNCTC LRHSQS S LEKQE LKKVNWNLQHG I S S FS YWHQ YTQR 
EDFAVWQPFFQNTLTPLNRGDTDLTFFSEDCFHFSDRGHAEMAIALWNNMLEPVGRK 
TTSNNFTHSRAKLKCPSPVSPYLYTLRNSRLIiPDQAEEAPEVLYWAVPVAAGVGLWG 
1 1 GT WWRCRRGGRRED P PMS LRTVAL 



SEQ ID NO: 83 



1624 bp 



NOV24b, 

CG57399-02 DNA Sequence 



GCCGG CTG ACATCAATGTAATTGG AGCCCTGGGTG ACT CTCTCACGGCAGGCAATGGG 



GCCGGGTCCACACCTGGGAACGTCTTGGACGTCTTGACTCAGTACCGAGGCCTGTCCT 
GGAGCGTCGGCGGAGATGAGAACATCGGCACCGTTACCACCCTGGCGAACATCCTCCG 
GGAATTCAACCCTTCCCTGAAGGGCTTCTCTGTTGGCACTGGGAAAGAAACCAGTCCT 



AATGCCTTCTTAAACCAGGCTGTGGCAGGAGGCCGAGCTGAGGATCTACCTGTCCAGG 
CCAGGAGGCTGGTGGACCTG ATGAAGAATGACACGAGGATACACTTTCAGGAAGACTG 
GAAGATAATAACCCTGTTTATAGGCGGCAATGACCTCTGTGATTTCTGCAATGATCTG 
GTCCACTATTCTCCCCAGAACTTCACAGACAACATTGGAAAGGCCCTGGACATCCTCC 
ATGCTGAGGTTCCTCGGGCATTTGTGAACCTGGTGACGGTGCTTGAGATCGTCAACCT 
GAGGGAGCTGTACCAGGAGAAAAAAGTCTACTGCCCAAGGATGATCCTCAGGTCTCTG 
TGTCCCTGTGTCCTGAAGTTTGATGATAACTCAACAGAACTTGCTACCCTCATCGAAT 
TCAACAAGAAGTTTC71GGAGAAGACCCACCAACTGATTGAGAGTGGGCGATATGACAC 
AAGGGAAG ATTTT ACTGTGGTTGTGC AG CCGTTCTTTG AAAACGTGGACATG CCAAAG 
ACCTCGGAAGGATTGCCTGACAACTCTTTCTTCGCTCCTGACTGTTTCCACTTCAGCA 
GCAAGTCT CACTCCCGAGCAGCCAGTGCTCTCTGGAACAATATG CTGGAG CCTGTTGG 
C C AGAAG A CG ACT CGTC AT AAG TTTG AAAACAAG AT C AAT ATCACATG TC CG AACCAG 
GTCCAGCCGTTTCTGAGGACCTACAAGAACAGCATGCAGGGTCATGGGACCTGGCTGC 
CATGCAGGGACAGAGCCCCTTCTGCCTTGCACCCTACCTCAGTGCATGCCCTGAGACC 
TG CAGACATCCAAGTTGTGGCTGCTCTGGGGG ATTCTCTGACCG CTGGC AATGGAATT 
GGCTCCAAACCAGACGACCTCCCCGATGTCACCACACAGTATCGGGGACTGTCATACA 
GAGAAAGTAAACCAGGGTTCTTATCAGACTCCTGGGTCAGCAAATCCAACAGGAAATG 
CACCAGAAAAGCACCAAATC CCTG AATCTTCACCTC CCCG CTTGCATGT ATACGTGTA 
CACGTGGTGTTCCTACXSTCTCTGTTTACTGTCTTTATGTGTTTATTCATGT^ 



TAGTCACACAGCTGCCTTTACATATATGTACACATCTGCACAGAAAACCTCTGAAACC 
CATCGCACACTTCG AGAGGCCATAACCAAG ACACAATCACAAT CAGCCATGT CTTGAA 
AG ATT AG C AATT CG ACAAG AG GAAAGGG TG AG AAAGGG CATC C CG AACACGGAAGTGG 
AGAAGCTCAGGGTGTGTCAGG CGAGCGGTTGCGTGT AGAT ATT CTCAAGTTT CTTTCT 
CrCCTAATAAAGTTCTCATTC CTGTAGGCTTCAAAGTAAGTGG CGAGT AGCTCAG AAT 



ORF Start: ATGat311 



ORF Stop: TGAat 1241 



SEQ ID NO: 84 



310 aa 



MW at 35240.6kD 



NOV24b, 

CG57399-02 Protein Sequence 



MKNDTRI HFQEDWKI I TLFIGGNDLCDFCNDLVHYS PQNFTDNI GKALD I LHAEVPRA 
FVNLVTVLEI VNLRELYQEKKVYCPRMI LR S LC PCVLKFDDNS T E LATLI EFNKKFQE 
KTHQLIESGRYDTREDFTVWQPFFENVDMPKTSEGLPDNSFFAPDCFHFSSKSHSRA 
ASALWNNMLE PVGQKTTRHKFENKI NITCPNQVQPFLRTYKNSMQGHGTWLPCRDRAP 
SALHPTSVHALRPADIQWAALGDSLTAGNGIGSKPDDLPDVTTQYRGLSYRESKPGF 
LSDSWVS KSNRKCTRKAPNP 



SEQ ID NO: 85 



4425 bp 



NOV24c, 

CG57399-03 DNA Sequence 



CTGGAGCATTCTGGCATGGGGCTGCGGCCAGGCATTTTCCTCCTGGAGCTGCTGCTGC 



TTCTGGGGCAAGGTACCCCTCAGATCCATACCTCTCCTAGAAAGAGTACATTGGAAGG 
GCAGCTATGGCCAGAGACAGTTCACTCTCTGAAGCCTTCTGATATTAAATTTGTGGCA 
GCCATTGGCAATCTGGAAATTGTGCCAGACCCAGGGACGGGCGATCTGGAGAAGCAAG 
ACGAAAGGCCACAGCAGGTGTGCATGGGAGTGATGACAGTCCTTTCAGACATCATCAG 
ATATTTCAGTCCTTCTGTTCCAATGCCTGTGTGCCACACTGGAAAGAGAGTCATACCC 
CACGATGGTGCTGAGGACTTGTGGATTCAGGCTCAAGAACTGGTGAGAAACATGAAAG 
AGAACCAACTTGACTTTCAATTTGACTGGAAGCTCATCAATGTGTTCTTCAGTAATGC 
AAGCCAGTGTTACCTGTGCCCCTCTGCTCAACAGAATGGGCTTGCGGCGGGCGGCGTG 
GATGAGCTGATGGGGGTGCTGGACTACCTGCAGCAGGAGGTGCCCAGAGCATTTGTAA 
ACCTGGTGGACCTCTCTGAGGTTGCAGAGGTCTCTCGTCAGTATCACGGCACTTGGCT 
CAGCCCTGCACCAGAGCCCTGTAATTGCTCAGAGGAGACCACCCGGCTGGCCAAGGTG 
GTGATGCAGTGGTCTTATCAGGAAGCCTGGAACAGCCTCCTGGCCTCCAGCAGGTACA 
GTCAGCAGGAGTCCTTCACCGTGGTTTTCCAGCCTTTCTTCTATGAGACCACCCCATC 
TGACCCCCGACTCCAGGATTCTACCACGCTGGCCTGGCATCTCTGGAATAGGATGATG 
G AGC CAG CAGG AGAG AAAG ATG AG C C ATTG AG TGT AAAAC ACGGG AGG CC AATG AAG T 
GTCCCTCTCAGGAGAGCCCCTATCTGTT C AG CTACAGAAACAGC AACTACCTG ACCAG 
ACTGCAGAAACCCCAAGACAAGCTTGAGGTAAGAGAAGGAGCGGAAATCAGATGTCCT 
GACAAAGACCCCTCCGATACGGTTCCCACCTCAGTTCATAGGCTGAAGCCGGCTGACA 
TCAACGTAATTGGAGCCCTGGGTGACTCTCTCACGGCAGGCAATGGGGCCGGGTCCAC 
ACCTGGGAACGTCTTGGACGTCTTGACTCAGTACCGAGGCCTGTCCTGGAGCGTCGGC 
GG AGATG AGAACATCGG CACCGTT AC CACCCTGG CGG ACATCCTCCGGG AATTCAACC 
CTTCCCTGAAGGGCTTCTCTGTTGGCACTGGGAAAGAAACCAGTCCTAATGCCTTCTT 
AAACCAGGCTGTGGCAGGAGGCCGAGCTGAGCAGGCCAGGAGGCTGGTGGACCTGATG 
AAGAATGACACGAGGATACACTTTCAGGAAGACTGGAAGATAATAACCCTGTTTATAG 
GCGGCAATGACCTCTGTGATTTCTGCAATGATCTGGTACACTATTCTCCCCAGAACTT 
CACAGACAACATTGGAAAGGCCCTGGACATCCTCCATGCTGAGGTTCCTCGGGCATTT 
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GTGAACCTGGTGACGGTGCTTGAGATCGTCAACCTGAGGGAGCTGTACCAGGAGAAAA 
AAGTCTACTGCCCAAGGATGATCCTCAGGTCACTGTGTCCCTGTGTCCTGAAGTTTGA 
TGATAACTCAACAGAACTTGCTACCCTCATCGAATTCAACAAGAAGTTTCAGGAGAAG 
ACCCACCAACTGATTGAGAGTGGGCGATATGACACAAGGGAAGATTTTACTGTGGTTG 
TGCAGCCGTTCTTTGAAAACGTGGACATGCCAAAGACCCAGGAAGGATTGCCTGACAA 
CTCTTTCTTCGCTCCTGACTGTTTCCACTTCAGCAGCAAGTCTCACTCCCGAGCAGCC 
AGTGCTCTCTGGAACAATATGCTGGAGCCTGTTGGCCAGAAGACGACTCGTCATAAGT 
TTGAAAACAAGATCAATATCACATGTCCGAACCAGGTAGAGTGGCCGTTTCTGAGGAC 
CT ACAAGAACAGCATG CAGGGTCATGGGAC CTGGCTG CCATG CAGGGACAGAGCCCCT 
TCTGCCTTGCACCCTACCTCAGTGCATGCCCTGAGACCTGCAGACATCCAAGTTGTGG 
CTGCTCTGGGGGATTCTCTGACCGCTGGCAATGGAATTGGCTCCAAACCAGACGACCT 
CCCCGATGTCACCACACAGTATCGGGGACTGTCATACAGTGCAGGAGGGGACGGCTCC 
CTGGAGAATGTGACCACCTTACCTGATATCCTTCGGGAGTTTAACAGAAACCTCACAG 
GCTACGCCGTGGGCACGGGTGATGCCAATGACACGAATGCATTCCTCAATCAAGCTGT 
TCCCGGAGCAAAGGCTAGGGATCTTATGAGCCAAGTCCAAACTCTGATGCAGAAGATG 
AAAGATGATCATAGAGTAAATTTCCATGAAGACTGGAAGGTCATCACAGTGCTGATCG 
GAGGCAGCGATTTATGTGACTACTGCACAGATTCGAATCTGTATTCTGCAGCCAACTT 
TGTTCACCATCTCCGCAATG CCTTGGACGTCCTG CATAG AG AGGTGCCC AGAGTCCTG 
GTCAACCTCGTGGACTTCCTGAACCCCACTATCATGCGGCAGGTGTTCCTGGGAAACC 
CAGACAAGTGCCCAGTGCAGCAGGCCAGCGTTTTGTGTAACTGCGTTCTGACCCTGCG 
GGAGAACTCCCAAGAGCTAGCCAGGCTGGAGGCCTTCAGCCGAGCCTACCAGAGCAGC 
ATGCGCGAGCTGGTGGGGTCAGGCCGCTATGACACGCAGGAGGACTTCTCTGTGGTGC 
TGCAGCCCTTCTTCCAGAACATCCAG CTCCCTGT CCTG CAGGATGGGCTCCCAG ATAC 
GTCCTTCTTTGCCCCAGACTGCATCCACCCAAATCAGAAATTCCACTCCCAGCTGGCC 
AGAGCCCTTTGG ACCAAT ATGCTTGAACCACTTGG AAG CAAAACAG AG AC CCTGGACC 
TGAGAG CAGAGATGCCCATCACCTGTCCC ACT CAGAATGAG CCCTTCCTGAGAACCCC 
TCGGAATAGTAACTACACGTACCCCATCAAGCCAGCCATTGAGAACTGGGGCAGTGAC 
TTCCTGTGTAC^GAGTGGAAGGCTTCC^TAGTGTTCCAACCTCTGTCCACCAGCTCC 
GACCAGCAGACATCAAAGTGGTGGCCGCCCTGGGTGACTCTCTGACTGTGGCAGTGGG 
AGCTCGACCAAACAACTCCAGTGACCTACCCACATCTTGGAGGGGACTCTCTTGGAGC 
ATTGGAGGGGATGGGAACTTGGAGACTCAGACCACACTGCCCGACATTCTGAAGAAGT 
TCAACCCTTACCTCCTTGGCTTCrCTACCAGCACCTGGGAGGGGACAGCAGGACTAAA 
TGTGGCAGCGGAAGGGGCCAGAGCTAGGGACATGCCAGCCCAGGCCTGGGACCTGGTA 
GAGCGAATGAAAAACAGCCCCCAGGACATCAACCTGGAGAAAGACTGGAAGCTGGTCA 
CACT CTTCATTGGGGTCAACG ACTTGTGTCATT ACTGTGAGAAT C CGGTAGGCGAATA 
TGTTCAGCACATCCAACAGGCCCTGGACATCCTCTCTGAGGAGCTCCCAAGGGCTTTC 
GTCAACGTGGTGGAGGTCATGGAGCTGGCTAGCCTGTACCAGGGCCAAGGCGGGAAAT 
GTGCCATGCTGGCAGCTCAGAACAACTGCACTTGCCTCAGACACTCGCAAAGCTCCCT 
GGAGAAGCAAGAACTGAAGAAAGTGAACTGGAACCTCCAGCATGGCATCTCCAGTTTC 
TCCT ACTGG CACCAATACACACAG CGTGAGGACTTTGCGGTTGTGGTGCAGCCTTTCT 
TCCAAAACACACTCACCCCACTGAACAGAGGGGACACTGACCTCACCTTCTTCTCCGA 
GGACTGTTTTCACTTCTCAGACCGCGGGCATGCCGAGATGGCCATCGCACTCTGGAAC 
AACATGCTGGAACCAGTGGGCCGCAAG ACTACCT CCAACAACTTCACCC ACAGC CGAG 
CCAAACTCAAGTGCCCCTCTCCTGAGAGCCCTTACCTCTACACCCTGCGGAACAGCCG 
ATTGCTCCCAGACCAGGCTGAAG AAG CCC CCG AGGTGCTCTACTGGGCTGTC CCAGTG 
GCAG CGGG AGTCGGCCTTGTGGTGGGCAT CATCGGGACAGTGGTCTGG AGGTGCAGGA 
GAGGTGGCCGGAGGGAAGATCCTCCAATGAGCCTGCGCACTGTGGCCCTCTAGGCCCG 
G<KMTGGGTCCTCACCCTAAACTCCCTATAGCCACTCTCTTCACCGCCCTCTGCCCCA 


GCCACTCCCGGCCACCAGGACATGCTTCAATGCCTGGTGCCATAGGAAGCCCAGGGGA 


CAGTCACAACTT CTTGG 




ORF Start: ATGat 16 


ORF Stop: TAG at 4285 




SEQ ID NO: 86 


1423 aa 


MWat 159352.7kD 


NOV24c, 

CG57399-03 Protein Sequence 


MGIJ^PGIFLLELLLLLGC^TPQIHTSPRKSTLEGQLWPETVHSLKPSDIKFVAAIGNL 
E I VPDPGTGDLEKQDERPQQVCMGVMTVLS D 1 1 RYFS PSVPM PVCHTGKRVI PHDGAE 
DLWIQAQELVRNMKENQLDFQFDWKLINVFFSNASQCYLCPSAQQNGLAAGGVDELMG 
VLDYLQQEVPRAFVNLVDLS EVAEVSRQYHGTWLS PAPEPCNCSEETTRLAKWMQWS 
YQEAWNSLLASSRYSEQES FTWFQPFFYETTPSDPRLQDSTTLAWHLWNRMME PAGE 
KDEPLSVKHGRPMKCPSQESPYLFSYRNSNYLTRLQKPQDKLEVREGAEIRCPDKDPS 
DTVPTS VHRLKPAD INVI GALGDS LTAGNGAGSTPGNVLDVLTQ YRGLSWS VGGDENI 
GTVTTLAD I LRE FNPSLKGFSVGTGKETS PNAFLNQAVAGGRAEQARRLVDLMKNDTR 
I H FQEDWK 1 1 TL F I GGNDLCD FCNDL VH YS PQNFTDN I GKALD I LHAE VP RAFVNL VT 
VLEIVNLRELYQEKKVYCPRMILRSLCPCVLKFDDNSTELATLIEFNKKFQEKTHQLI 
ESGRYDTRED FTWVQP FFENVDM PKTQEG LPDNS F FA PDC FH FS S KSHS RAAS ALWN 
NMLEPVGQKTTRHKFENKINITCPNQVEWPFLRTYKNSMQGHGTWLPCRDRAPSALHP 
TSVHALRPADIQWAALGDSLTAGNGIGSKPDDLPDVTTQYRGLSYSAGGDGSLENVT 
TL PD I LRE FNRNLTG YAVGTGDANDTNAFLNQAV PGAKARD LMS QVQTLMQKMKDDHR 
VN FHEDWKVITVLI GGSDLCDYCTDSNLYS AAWFVHHLRNALDVLHREVPRVLVNLVD 
FLN PT I MRQ VFLGN PDKC P VQQAS VLCNCVLT LRENSQELARLEAFSRA YQS SMREL V 
GS GRYDTQED FS WLQP F FQNI Q L P VLQDG L PDTS F F A PD C I H PNQKFHS QLARALWT 
NMLEPLG S KTETLD LRAEMP I TC PTQNE P F LRTPRNSNYT Y P I KPA I ENWGS DFLCTE 
WKASNSVPTSVHQLRPADIKWAALGDSLTVAVGARPNNSSDLPTSWRGLSWSIGGDG 
NLETHTTLPDILKKFNPYLLGFSTSTWEGTAGIiNVAAEGARARDMPAQAWDLVERMKN 
SPQDINLEKDWKLVTLFIGVNDLCHYCENPVGEYVQHIQQALDILSEELPRAFVNWE 
VME LAS L YQGQGGKCAMLAAQNNCTCLRH S QS SLE KQE LKKVNWNLQHG I S S FS YWHQ 
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YTQREDFAVWQPFFQKTLTPLNRGDTDLTFFSEDCFHFSDRGHAEMAIALWNNMLEP 
VGRKTTSNNFTHSRAKLKCPSPESPYLYTLRNSRLLPDQAEEAPEVLYWAVPVAAGVG 
LWGI IGTWWRCRRGGRREDPPMSLRTVAIi 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 24B. 



Table 24B. Comparison of NOV24a against NOV24b through NOV24c. 


Protein Sequence 


NOV24a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV24b 


454.748 
1..293 


283/295 (95%) 
285/295 (95%) 


NOV24c 


27..1419 i 
23.. 1423 


1211/1426 (84%) 
1261/1426 (87%) 



Further analysis of the NOV24a protein yielded the following properties shown in 
Table 24C. 



Table 24C. Protein Sequence Properties NOV24a 


PSort 
analysis: 


0.6850 probability located in endoplasmic reticulum (membrane); 0.6400 
probability located in plasma membrane; 0.4600 probability located in Golgi 
body; 0.1080 probability located in nucleus 


SignalP 
analysis: 


Likely cleavage site between residues 24 and 25 



A search of the NOV24a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 24D. 



Table 24D. Geneseq Results for NOV24a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV24a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAW30751 


Rat phospholipase-B/lipase - Rattus 
rattus, 1450 aa. [JP09248190-A, 22- 
SEP-1997] 


50.. 1403 
60.. 1447 


91 1/1404 (64%) 
1077/1404 (75%) 


0.0 


ABB11053 


Human phospholipase B homologue, 


985.. 1203 
45..267 


205/224 (91%) 
213/224(94%) 


e-117 
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267 aa. [WO200157188-A2, 09- 
AUG-2001] 








AAM25824 


Human protein sequence SEQ ID 
NO: 1 339 - Homo sapiens, 267 aa. 
[WO200153455-A2.26-JUL-2001] ] 


985..1203 
45. .267 


205/224 (91%) 
213/224 (94%) 


e-117 


AAM95420 


Human reproductive system related 
antigen SEQ ID NO: 4078 - Homo 
sapiens, 148 aa. [WO2001 55320- 
A2, 02-AUG-2001] 


979..1106 
4..133 


110/130 (84%) 
117/130 (89%) 


3e-56 


A "DO 1 i on 


Human phospholipase homologue, 
SEQ ID NO:1607- Homo sapiens, \ 
132 aa. [WO200157188-A2, 09- 
AUG-2001] 


393. .478 
43..132 


84/90 (93%) 
86/90 (95%) 


3e-40 



In a BLAST search of public sequence databases, the NOV24a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 24E. 



Table 24E. Public BLASTP Results for NOV24a 


Protein ; 
Accession \ 
Number 


Protein/Organism/Length 


NOV24a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q05017 ; 


Phospholipase ADRAB-B precursor 
(EC 3.1.-.-) - Oryctolagus cuniculus 
(Rabbit), 1458 aa. 


6..1416 
2..1456 | 


1042/1466(71%) 
1179/1466(80%) 


0.0 


070320 ! 


PHOSPHOLIPASE B - Cavia 
porcellus (Guinea pig), 1463 aa. 


7..1414 
3..1458 


965/1474(65%) 
1135/1474(76%) 


0.0 


054728 : 


PHOSPHOLIPASE B - Rattus 
norvegicus (Rat), 1450 aa. 


50.. 1403 ; 
60..1447 


911/1404(64%) 
1077/1404(75%) 


0.0 


Q96DP9 


CDNA FU30866 FIS, CLONE 
FEBRA2004110, HIGHLY SIMILAR 
TO PHOSPHOLIPASE ADRAB-B 
PRECURSOR (EC 3.1.-.-) - Homo 
sapiens (Human), 270 aa. 


454..714 1 
1..259 


257/261 (98%) 
258/261 (98%) 


e-151 


Q9N2Z4 ■ 


HYPOTHETICAL 41 .4 KDA 
PROTEIN - Caenorhabditis elegans, 
377 aa. 


343..673 
37..369 


130/343(37%) 
202/343 (57%) 


le-59 



PFam analysis predicts that the NOV24a protein contains the domains shown in the 
Table 24F. 
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Table 24F. Domain Analysis of NOV24a 


Pfam Domain 


NOV24a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


Lipase_GDSL: domain 1 of 
3 


360..484 


54/147 (37%) 
116/147(79%) ' 


4.8e-42 


Lipase GDSL: domain 2 of 
3 


705..834 


57/147 (39%) 
1167147(79%) 


4.5e-44 


SecA_protein: domain 1 of 1 


834..851 


10/20(50%) 
17/20(85%) 


4.9 


Vitellogenin N: domain 1 of i 
1 


1107..1124 


8/18 (44%) 
17/18(94%) j 


3.8 


Lipase GDSL: domain 3 of 

3 ! 


1062..1185 


48/147 (33%) 
114/147(78%) 


6.3e-37 



Example 25. 

The NOV25 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 25 A. 



Table 25A. NOV25 Sequence Analysis 




SEQ ID NO: 87 


1348 bp 


NOV25a, 

CG593 11-01 DNA Sequence 


CTGGGTCGCCCCTGTTCTACCCAGATTGGGATGGCAGCGACGCTGATCCTGGAGCCCG 
CGGGCCGCTGCTGCTGGGACGAGCCGCTGCGCATCGCAGTGCGCGGCCTGGCCCCGGA 
GCAGCCAGTCACG CTGCGCACGTCCCTG CGCG ACG AAG AGGGCGCGCT CTTCCGGGCC 
CACGCGCGCTACCGTGCCGACGCCCGCGACGAGCTGGACCTGGAGCGCGCGCCCGCGC 
TGGGAGGCAGCTTCGCGGGGCTCCAGCCCATGGGGCTGCTGTGGGCGTTGGAGCCCGA 
GAAAGCCTTGGTG CGGCTGGTG AAG CGCG ACGTGCGGACGCCCTTCGCCGTC5GAGCTG 
GAAGTGCTGGACGGCCACGACACCGAGCCCGGGCGGCTGCTGTGCCTGGCGCAGAACA 
AGCGCGACTTTCTCCGGCCGGGGGTGCGGCGCGAGCCGGTGCGCGCGGGCCCGGTGCG 
CGCCGCGCTCTTCCTGCCGCCGGATAGGGGGCCCTTTCCTGGGATCATTGATCTGTTT 
GGGAGCAG CAGGGGCCTTTGTG AATACAGGGCCAGC CTC CTGGCCGG AC ATGGTTTTG 
CTGTGCTTGCCCTGGCTTATTTCAGATTTGAAGACCTCCCCGAAGATCTGAATGATGT 
ACATCTGGAGTACTTTGAAGAAGCCGTGGACTTTATGCTGCAGCATCCAAAGGTGAAA 
GGT CCT AGTATTG CGCTTCTTGG ATTTTCCAAAGGAGGTGACCTGTGTCTCTCAATGG 
CTTCTTTCTTGAAGGGCATCACAGCCACTGTACTTATCAATGCCTGTGTAGCCAACAC 
AGTAGCTCCTCTACATTACAAGGATATGATTATTCCTAAACTTGTCGATGATCTAGGA 
AAAGTAAAAATCACTAAGTCAGGATTTCTCACTTTTATGGACACTTGGAGCAATCCAC 
TGGAGGAACACAATCACCAAAGTCTTGTTCCATTGGAAAAGGCGCAGGTGCCCTTCTT 
GTTTATTGTTGGCATGG ATGATCAAAG CTGGAAGAGTGAATTCTATGCTCAG AT AGC C 
TCTGAAAGGCTACAAGCTCATGGGAAAGAAAGACCCCAGATAATCTGTTACCCAGAAA 
CTGGTCACTGTATTGACCCACCTTATTTTCCTCCTTCTAGAGCTTCTGTGCACGCTGT 
TTTGGGTG AGG CAAT ATTCT ATGG AGG TG AG C CAAAGG CTCA CT C AAAGG CAC AGGT A 
G ATG C CTGGC AG CAAATTCAAACTTTCTTCCAT AAAC AT CT CAATG GT AAAAAAT CTG 
TCAAGCACAGCAAAATATAACATTGTAGCCACAGACCAGATACCATTAATAAAAATCC 


TATTCATACAACTT 




ORF Start: ATG at 31 


ORF Stop: TAA at 1294 




SEQ ID NO: 88 


421 aa 


MWat46815.4kD 


NOV25a, 

CG593 11-01 Protein Sequence 


MAATLILEPAGRCCWDEPLRIAVRGLAPEQPVTLRTSLRDEEGALFRAHARYRADARD 
EU5LERAPAIXMSFAGLQPMGLLWALEPEKAIiVRLVKRDVRTPFAVELEVLDGHDTEP 
GRLLCLAQNKROFLRPGVRRE PVRAGPVRAALFLPPDRGPFPGI IDLFGSSRGLCEYR 
ASLIAGHGFAVIALAYFRFEDLPEDLm>VHLEYFEEAWFMLQHPKVKGPSIALIiGFS 
KGGDLCLSMASFLKGITATVLINACVANTVAPLHYKDMI I PKLVDDLGKVKITKSGFL 
TFMDTWSNPLEEHNHQSLVPLEKAQVPFLFIVGMDDQSWKSEFYAQIASERLQAHGKE 
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RPQIICYPETGHCIDPPYFPPSRASVHAVLGEAIFYGGEPKAHSKAQVDAWQQIQTFF 
HKHLNGKK5VKHSKI 




SEQ ID NO: 89 


1021 bp 


NOV25b, 

CG59311-02 DNA Sequence 


AGATTGGGATGGCAGCGACGCTGATCCTGGAGCCCGCGGGCCGCTGCTGCTGGGACGA 
GCCGCTGCGCATCGCAGTGCGCGGCCTGGCCCCGGAGCAGCCAGTCACGCTGCGCACG 
TCCCTGCGCGACGAAGAGGGCGCGCTCTTCCGGGCCCACGCGCGCTACCGTGCCGACG 
CCTCTAATCCCGGCACTTTGGGGGGCCAAGGCAGGGGGCCCTTTCCTGGGATCATTGA 
TCTGTTTGGGAGCAGCAGGGGCCTTTGTGAATACAGGGCCAGCCTCCTGGCCGGACAT 
GGTTTTGCTGTGCTTGCCCTGGCTTATTTCAGATTTGAAGACCTCCCCGAAGATCTGA 
ATGATGTACATCTGGAGTACTTTGAAGAAGCCGTGGACTTTATGCTGCAGCATCCAAA 
GGTGAAAGGTCCTAGTATTGCGCTT CTTGGATTTTCCAAAGGAGGTG ACCTGTGT CTC 
TCAATGGCTTCTTTCTTGAAGGGCATCACAGCCACTGTACTTATCAATGCCTGTGTAG 
CCAACACAGTAGCTCCTCTACATTAC7VAGGATATGATTATTCCTAAACTTGTQ3ATGA 
TCTAGGAAAAGTAAAAATCACTAAGTCAGGATTTCTCACTTTTATGGACACTTGGAGC 
AATCCACTGGAGGAACACAATCACCAAAGTCTTGTTCCATTGGAAAAGGCGCAGGTGC 
CCTTCTTGTTTATTGTTGGCATGGATGATCAAAGCTGGAAGAGTGAATTCTATGCTCA 
GATAGCCTCTGAAAGGCTACAAGCTCATGGGAAAGAAAGACCCCAGATAATCTGTTAC 
CCAGAAACTGGTCACTGTATTGACCCACCTTATTTTCCTCCTTCTAGAGCTTCTGTGC 
ACGCTGTTTTGGGTGAGGCAATATTCTATGGAGGTGAGCCAAAGGCTCACTCAAAGGC 
ACAGGTAGATGCCTGGCAGCAAATTCAAACTTTCTTCCATAAACATCTCAATGGTAAA 
AAATC TGTCAAGCACAG CAAAATAT AACATTGTAG 




ORF Start: ATG at 9 


ORF Stop: TAA at 1011 




SEQ ID NO: 90 


334 aa 


MW at 36926.0kD 


NOV25b, 

^vijyji i-uz rrotein sequence 


MAATLI LEPAGRCCWPE PLRI AVRGLAPEQPVTLRTS LRDEEGALFRAHARYRADASN 
PGTLGGQGRGPFPGI IDLFGSS RGLCEYRASLLAGHGFAVLALAYFRFEDLPEDLNDV 
HLEYFEEAVDFMLQHPKVKGPSIALIX3FSKGGDLCLSMASFLKGITATVLINACVANT 
VAPLHYKDMI I PKLVDDLGKVKITKSGFLTFMDTWSNPLEEHNHQSLVPIiEKAQVPFL 
FIVGMDDQSWKSEFYAQIASERLQAHGKERPQIICYPETGHCIDPPYFPPSRASVHAV 
LG EAX FYGGE PKAHS KAQVDAWQQ I QT F FHKHLNGKKS VKH SKI 




SEQ ID NO: 91 


1021 bp 


NOV25c, 

CG5931 1-03 DNA Sequence 


AG ATTGGG ATGGCAGCG ACGCTGATCCTGG AG CC CGCGGGCCG CTG CTGCTGGGACGA 
GCCGCTGCGCATCGCAGTGCGCGGCCTGGCCCCGGAGCAGCCAGTCACGCTGCGCACG 
TCCCTGCGCGACGAAGAGGGCGCGCTCTTCCGGGCCCACGCGCGCTACCGTGCCGACG 
CCTCTAATCCCGGCACTTTGGGAGGCCAAGGCAGGGGGCCCTTTCCTGGGATCATTGA 
TCTGTTTGGGAGCAGCAGGGGCCTTTGTGAATACAGGGCCAGCCTCCTGGCCGGACAT 
GGTTTTGCTGTGCTTGCC CTGGCTTATTTCAGATTTGAAGACCT CCCCGAAG AT CTGA 
ATGATGTACATCTGGAGTACTTTGAAGAAGCCGTGGACTTTATGCTGCAGCATCCAAA 
GGTGAAAGGTCCTAGTATTGCGCTTCTTGGATTTTCCAAAGGAGGTGACCTGTGTCTC 
TCAATGG CTT CTTT CTTG AAGGG CATCACAG C CAC TG T ACTT AT CAATG C CTG TGT AG 
CCAAC ACAGTAG CTCCTCT ACATTACAAGGATATGATT ATTCCTAAACTTGT CGATG A 
TCTAGGAAAAGTAAAAATCACTAAGTCAGGATTTCTCACTTTTATGGACACTTGGAGC 
AATCCACTGG AGGAACACAATCAC CAAAGTCTTGTT CCATTGGAAAAGGCGCAGGTGC 
CCTTCTTGTTTATTGTTGGCATGGATGATCAAAGCTGGAAGAGTGAATTCTATGCTCA 
GATAGC CTCTGAAAGG CTACAAG CT CATGGGAAAGAAAGACCCCAGATAATCTGTT AC 
CCAGAAACTGGTCACTGTATTG AC C CACCTTATTTT CCTCCTTCTAGAGCTT CTGTGC 
ACGCTG1'1'1 U IGGGTGAGGCAATATTCTATGGAGGTGAGCCAAAGGCTCACTCAAAGGC 
ACAGGT AG ATGCCTGG CAGCAAATTCAAACTTTCTTC CATAAACATCTCAATGGT AAA 
AAATCTGT CAAGCACAGCAAAAT AT AACATTGTAG 




ORF Start: ATG at 9 


ORF Stop: TAA at 1011 




SEQ ID NO: 92 


334 aa 


MW at 36926.0kD 


NOV25c, 

CG593 11-03 Protein Sequence 


MAATL I LEPAGRCCWDEPLRIAVRGLAPEQPVTLRTS LRDEEGALFRAHARYRADASN 
PGTIX^QGRGPFPGIIDLFGSSRGLCEYRASLLAGHGFAVLALAYFRFEDLPEDLNDV 
HLEYFEEAVDFMLQHPKVKGPSIALLGFSKGGDLCLSMASFLKGITATVLINACVANT 
VAPLHYKDMI I PKLVDDLGKVKITKSGFLTFMDTWSNPLEEHNHQSLVPLEKAQVPFL 
F I VGMDDQSWKS E FYAQ I AS ERLQ AHGKER PQ 1 1 CY P ETGHC I D PPYFPPS RAS VHAV 
LGEAI FYGGEPKAHSKAQVDAWQQIQTFFHKHLNGKKSVKHS KI 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 25B. 



Table 25B. Comparison of NOV25a against NOV25b through NOV25c. 


Protein Sequence 
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Match Residues | Similarities for the Matched Region 


NOV25b 


154..421 268/268(100%) 
67..334 1 268/268 (100%) 


NOV25c 


154..421 268/268(100%) 
67..334 j 268/268 (100%) 



Further analysis of the NOV25a protein yielded the following properties shown in 
Table 25C. 



Table 25C Protein Sequence Properties NOV25a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3630 probability located in microbody 
(peroxisome); 0.1958 probability located in lysosome (lumen); 0.1000 
probability located in mitochondrial matrix space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV25a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 25D. 



Table 25D. Geneseq Results for NOV25a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV25a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM41490 


Human polypeptide SEQ ID NO 
6421 - Homo sapiens, 494 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..421 
74..494 


288/421 (68%) 
347/421 (82%) j 


e-175 


AAM39704 


Human polypeptide SEQ ID NO 
2849 - Homo sapiens, 483 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..421 
63..483 


288/421 (68%) i 
346/421 (81%) j 


e-175 


AAY71112 


Human Hydrolase protein- 1 0 
(HYDRL-10) - Homo sapiens, 483 
aa. [WO200028045-A2, 18-MAY- j 
2000] 


1..421 
63..483 


288/421 (68%) 
346/421 (81%) i 


e-175 


AAB93479 


Human protein sequence SEQ ID 
NO: 1 2766 - Homo sapiens, 483 aa. 
[EP1074617-A2, 07-FEB-2001] 


1..421 i 
63..483 j 


287/421 (68%) ; 
346/421 (82%) 


e-175 


AAY07932 








e-105 
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encoded from gene 81 - Homo 
sapiens, 182 aa. [WO9918208-A1, 
15-APR-1999] 


1.181 


181/181(100%) | 





In a BLAST search of public sequence databases, the NOV25a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 25E. 



Table 25E. Public BLASTP Results for NOV25a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV25a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


P49753 


Peroxisomal acyl-coenzyme A thioester 
hydrolase 2 (EC 3.1.2.2) (Peroxisomal 
long-chain acyl-coA thioesterase 2) 
(ZAP 128) - Homo sapiens (Human), 421 
aa. 


1..421 
1..421 


288/421 (68%) 
347/421 (82%) 


e-175 


Q9QYR7 


Peroxisomal acyl-coenzyme A thioester 
hydrolase 2 (EC 3.1.2.2) (Peroxisomal 
long-chain acyl-coA thioesterase 2) 
(PTE-la) - Mus musculus (Mouse), 432 
aa. 


1..421 
12..432 


264/421 (62%) 
331/421 (77%) 


e-157 


088267 


Cytosolic acyl coenzyme A thioester 
hydrolase, inducible (EC 3.1.2.2) (Long 
chain acyl-CoA thioester hydrolase) 
(Long chain acyl-CoA hydrolase) (CTE- 
I) (LACH2) (ACH2) - Rattus norvegicus 
(Rat), 419 aa. 


1..421 
1..419 


268/421 (63%) 
318/421 (74%) 


e-153 


Q9QYR9 


Acyl coenzyme A thioester hydrolase, 
mitochondrial precursor (EC 3.1.2.2) 
(Very-long-chain acyl-CoA thioesterase) 
(MTE-I) - Mus musculus (Mouse), 453 
aa. 


3..413 
44.. 452 j 


264/411 (64%) 
321/411 (77%) 


e-153 


055137 


Cytosolic acyl coenzyme A thioester 
hydrolase, inducible (EC 3.1 .2.2) (Long 
chain acyl-CoA thioester hydrolase) 
(Long chain acyl-CoA hydrolase) (CTE- 
I) - Mus musculus (Mouse), 419 aa. 


1..413 
1..411 


262/413 (63%) 
319/413(76%) 


e-153 



PFam analysis predicts that the NOV25a protein contains the domains shown in the 
Table 25F. 
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Table 25F. Domain Analysis of NOV25a 



Pfam Domain 



NOV25a Match Region 



Identities/ 
Similarities 
for the Matched Region 



Expect Value 



No Significant Matches Found 



Example 26. 

The NOV26 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 26A. 



Table 26A. NOV26 Sequence Analysis 



SEQ ID NO: 93 



1375 bp 



NOV26a, 

CG59309-01 DNA Sequence 



GGGACGCCGGACGCCGTCCGGACATTCGGCGCGCTTGCCACGATCTTGGACGGGTCTC 



GGGCCTCGACCTTTGAATTCCCCGCTCCGGCTCCAAGA TGTCAGCAACGCTGATCCTG 
GAGCCCCCAGGCCGCTGCTGCTGGAACGAGCCGGTGCGCATTGCCGTGCGCGGCCTGG 
CCCCGGAGCAGCGGGTTACGCTGCGCGCGTCCCTGCGCGACGAGAAGGGCGCGCTCTT 
CCGGGCCCACGCGCGCTACTGCGCCGACGCCCGCGGCGAGCTGGACCTGGAGCGCGCA 
CCCGCGCTGGGCGGCAGCTTCGCGGGACTCGAGCCCATGGGGCTGCTCTGGGCCCTGG 
AACCCGAGAAGCCTTTTTGGCGCTTCCTGAAGCGGGACGTACAGATTCCTTTTGTCGT 
GGAGTTGGAGGTGCTGGACGGCCACGAC CCCGAG CCTGGACGGCTG CTGTGCCAGGCG 
CAGCACGAGCGCCACTTCCTCCCGCCAGGGGTGCGGCGCCAGTCGGTGCGAGCGGGCC 
GGGTGCGCGCCACGCTCTTCCTGCCGCCAGGTGAGCCTGGACCCTTCCCAGGGATCAT 
TG ACATCTTTGGTATTGG AGGGGG CCTCTTGGAATATCGAGC CAG C CT C CTTGCTGG C 
CATGGCTTTG CCACGTTGGCT CTAGCTT ATT ATAACTTTG AAGATCTCCCCAATAACA 
TGGACAACATATCCCTGGAGTACTTCGAAGAAGCCGTATGCTACATGCTTCAACATCC 
CCAGGTTAAAGGCCCAGGCATTGGGCTTTTGGGCATTTCTCTAGGAGCTGATATTTGT 
CTCTCAATGGCCTCATTCTTGAAGAATGTCTCAGCCACAGTTTCCATCAATGGATCTG 
GGATCAGTGGGAACACAGCCATCAACTATAAGCACAGTAGCATTCCACCATTGGGCTA 
TGACCTGAGG AG AATCAAGGT AGCTTTCTCAGGC CT CGTGGACATTGTGG ATATAAGG 
AATGCTCTCGTAGGAGGGTACAAGAACCCCAGCATGATTCCAATAGAGAAGGCCCAGG 
GG CCCATCCTGCTCATTGTTGG TCAGGATGAC CATAACTGG AGAAGTG AGTTGTATGC 
CCAAACAGTCTCTGAACGGTTACAGGCCCATGGAAAGGAAAAACCCCAGATCATCTGT 
TACCCTGGGACTGGGCATTACATCGAGCCTCCTTACTTCCCCCTGTGCCCAGCTTCCC 
TTCACAGATTACTGAACAAACATGTTATATGGGGTGGGGAGCCCAGGGCTCATTCTAA 
GG CCC AGG AAG ATG C CTGG AAGCAAATTCT AG C CTTCTTCTGCAAACAC CTGGG AG GT 
ACCCAGAAAACAG CTGTCCCT AAATTGTAATGCATTTGTCT 



ORE Start: ATG at 96 



ORFStop: TAAat 1362 



SEQ ID NO: 94 



422 aa 



MW at 46455. lkD 



NOV26a, 

CG59309-01 Protein Sequence 



M S ATL I LE P PGRC CWNE P VR I AVRGLAPEQRVTLRAS LRDEKGALFRAHAR YCADARG 
ELDLERAPALGGS FAGLEPMGLLWALEPEKPFWRFLKRDVQI PFWELEVLDGHDPEP 
GRLLCQAQHERHFLPPGVRRQSVRAGRVRATLFLPPGEPGPFPGI IDI FG IGGGLLEY 
RASLIiAGHGFATIALAYYNFEDLPNNMDNISLEYFEEAVCYMLQHPQVKGPGIGLLGI 
S LGAD I C LSMAS F LKNVS ATVS I NGSG I SGNT AI NYKH S S I P PLG YDLRR I KVAFSGL 
VD I VDI RNALVGGYKNPSMI PI EKAQGP I LLI VGQDDHNWRSELYAQTVS ERLQAHGK 
EKPQI ICYPGTGHYIEPPYFPLCPASLHRLLNKHVIWGGEPRAHSKAQEDAWKQI LAF 
FCKHbGGTQKTAVPKL 



Further analysis of the NOV26a protein yielded the following properties shown in 
Table 26B. 



Table 26B. Protein Sequence Properties NOV26a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.2585 probability located in lysosome 
(lumen); 0.1940 probability located in microbody (peroxisome); 0.1000 
probability located in mitochondrial matrix space 
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SignalP 
analysis: 



No Known Signal Sequence Predicted 



A search of the NOV26a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 26C. 



Table 26C. Geneseq Results for NOV26a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV26a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM41490 


Human polypeptide SEQ ID NO 6421 
- Homo sapiens, 494 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..422 
74.. 494 


296/422 (70%) 
341/422 (80%) 


e-179 


AAM39704 


Human polypeptide SEQ ID NO 2849 
- Homo sapiens, 483 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..422 
63..483 


296/422 (70%) 
341/422 (80%) 


e-179 


AAY71112 


Human Hydrolase protein- 1 0 
(HYDRL-10) - Homo sapiens, 483 
aa. [WO200028045-A2, 18-MAY- 
2000] 


1..422 
63..483 


296/422 (70%) 
341/422 (80%) 


e-179 


AAB93479 


Human protein sequence SEQ ID 
NO: 12766 - Homo sapiens, 483 aa. 
[EP1074617-A2, 07-FEB-2001] 


1..422 
63..483 


295/422 (69%) 
340/422 (79%) 


e-178 


AAY07932 


Human secreted protein fragment 
encoded from gene 81 - Homo 
sapiens, 182 aa. [WO9918208-A1, 
15-APR-1999] 


242..422 
1..181 


93/181 (51%) 
123/181 (67%) 


2e-48 



In a BLAST search of public sequence databases, the NOV26a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 26D. 
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Table 26D. Public BLASTP Results for NOV26a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV26a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9QYR8 


PEROXISOMAL LONG CHAIN 
ACYL-COA THIOESTERASE IB - 
Mus musculus (Mouse), 421 aa. 


1..422 
1..421 


312/422 (73%) 
362/422 (84%) 


0.0 


P49753 


Peroxisomal acyl-coenzyme A thioester 
hydrolase 2 (EC 3.1.2.2) (Peroxisomal 
long-chain acyl-coA thioesterase 2) 
(ZAP 128) - Homo sapiens (Human), 421 
aa. 


1..422 
1..421 


296/422(70%) 
341/422(80%) 


e-178 


Q9QYR7 


Peroxisomal acyl-coenzyme A thioester 
hydrolase 2 (EC 3.1.2.2) (Peroxisomal 
long-chain acyl-coA thioesterase 2) 
(PTE-la) - Mus musculus (Mouse), 432 
aa. 


1..422 
12..432 


281/424 (66%) 
333/424 (78%) 


e-163 


055137 


Cytosolic acyl coenzyme A thioester 
hydrolase, inducible (EC 3. 1 .2.2) (Long 
chain acyl-CoA thioester hydrolase) 
(Long chain acyl-CoA hydrolase) (CTE- 
I) - Mus musculus (Mouse), 419 aa. 


1..422 j 
1..419 ? 


275/423 (65%) 
330/423 (78%) 


e-162 


088267 


Cytosolic acyl coenzyme A thioester 
hydrolase, inducible (EC 3.1.2.2) (Long 
chain acyl-CoA thioester hydrolase) 
(Long chain acyl-CoA hydrolase) (CTE- 
I) (LACH2) (ACH2) - Rattus norvegicus 
(Rat), 419 aa. 


1..422 
1..419 


276/423 (65%) 
329/423 (77%) 


e-162 



PFam analysis predicts that the NOV26a protein contains the domains shown in the 
Table 26E. 



Table 26E. Domain Analysis of NOV26a 


Pfam Domain 


NOV26a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


DLH: domain 1 of 2 


144..188 


17/52(33%) 
32/52 (62%) 


63 


DLH: domain 2 of 2 


394..411 


9/18 (50%) 
13/18(72%) 


2.6 
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The NOV27 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 27A. 



Table 27 A. NOV27 Sequence Analysis 




SEQ ID NO: 95 


1333 bp 


NOV27a, 

CG57364-01 DNA Sequence 


CCTGGCCCCCAAGCTCCCCACTCTGGTGCCCCGAGCAGCCCTGTGGGCAAGCAGCCGC 


CGCCATOGCCGAGCACCTGGAGCTGCTGGCAGAGATGCCCATGGTGGGCAGGATGAGC 

CCCAGG CTGAGAAGG aggcpcagggc aag AAGfiGTrpTf^Gfzrz an rYyrwnmfia&rmzi 
GGCAGCCAGCCAAGGGCTCCTGAAGCAGGTCCTCTTCCCTCCCAGTGTTGTCCTTCTG 
GAGGCCGCTGCCCGAAATGACCTGGAAGAAGTCCGCCAGTTCCTTGGGAGTGGGGTCA 
GCCCTGACTTGGCCAACGAGGACGGCCTGACGGCCCTGCACCAGTGCTGCATTGATGA 
TTTCCGAGAGATGGTGCAGCAGCTCCTGGAGGCTGGGGCCAACATCAATGCCTGTGAC 

agtgagtgck^acgcctctgcatgctgcggccacctgcggccac^^ 

agctg ctcatcgccagtggcgccaatctcctggcggt caacaccgacgggaacatgcc 

CTATGACCTGTGTGATGATGAGCAGACGCTGGACTGCCTGGAGACTGCCATGGCCGAC 
CGTGGCATCACCCAGGACAGCATCGAGGCCGCCCGGGCCGTGCCAGAACTGCGCATGC 
TGGACGACATCCGGAGCCGGCTGCAGGCCGGGGCAGACCTCCATGCCCCCCTGGACCA 
CGGGGCCACGCTGCTGCACGTCGCAGCCGCCAACGGGTTCAGCGAGGCGGCTGCCCTG 
CTGCTGGAACACCGAGCCAGCCTGAGCGCTAAGGACCAAGACGGCTGGGAGCCGCTGC 
ACGCCGCGGCCTACTGGGGCCAGGTGCCCCTGGTGGAGCTGCTCGTGGCGCACGGGGC 
CGACCTGAACGCAAAGTCCCTGATGGACGAGACGC CC CTTGATGTGTGCGGGGACGAG 
G AGGTGCGGGCCAAG CTG CTGGAGCTG AAG CACAAG C ACGACGCCCTC CTG CGCG CCC 
AGAGCCGCCAGCGCTCCTTGCTGCGCCGCCGCACCTCCAGCGCCGGCAGCCGCGGGAA 
GGTGGTGAGGCGGGATGAGCCTAACCCAGCGCAGCGGCTGACGCATGTCCCAGAAGCG 
GCGCGCCCAGCAGGTGAAGATGTGGGCCCAGGCTGAGAAGGAGGCCCAGGGCAAGAAG 
GGT CCTGGGGAG CGTCCCCGGAAGGAGGCAGCCAGCCAAGGG CTCCTGAAGCAGGTC C 


TCTTCCCTCCCAGTGTTGTCCTTCTGGAGGCCGCTGCCCGAAATGACCTGGAAGAAG 




ORF Start: ATG at 63 


ORF Stop: TGA at 1194 




SEQ ID NO: 96 


377 aa 


MWat41019.9kD 


NOV27a, 

CG57364-01 Protein Sequence 


MAEHLELLAEMPMVGRMSTQERLKHAQKRRAQQVKMWAQAEKEAQGKKGPGERPRKEA 
ASQGLLKQVLFPPSWLLEAAARNDLEEVRQFLGSGVSPDLANEDGLTALHQCCIDDF 
REMVQQLLEAGANINACDS ECWT PLHAAATCGHLHLVELLI ASGANLLAVNTDGNMP Y 
DLCDDEQTLDCLErrAMADRGITQDSIEAARAVPELRMLDDIRSRLQAGADLHAPLDHG 
ATLLHVAAANGFSEAAALLLEHRASLSAKDQDGWEPLHAAAYWGQVPLVELLVAHGAD 
LNAKSLMDETPLDVCGDEEVRAKLLELKHKHDALLRAQSRQRSLLRRRTSSAGSRGKV 
VRRDEPNPAQRLTHVPEAARPAGEDVGPG 



Further analysis of the NOV27a protein yielded the following properties shown in 
Table 27B. 



Table 27B. Protein Sequence Properties NOV27a 


PSort 
analysis: 


0.3000 probability located in microbody (peroxisome); 0.3000 probability 
located in nucleus; 0.1547 probability located in lysosome (lumen); 0.1000 
probability located in mitochondrial matrix space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NO V27a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 27C. 
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Table 27C. Geneseq Results for NOV27a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV27a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM40636 


Human polypeptide SEQ ID NO 
5567 - Homo sapiens, 440 aa. 
[WO200153312-A1, 26-JUL-2001] 


89..351 
1..263 


262/263 (99%) 
263/263 (99%) 


e-151 


AAM38850 


Human polypeptide SEQ ID NO 
1995 - Homo sapiens, 410 aa. 
[WO200153312-A1, 26-JUL-2001] 


119.351 
1..233 


233/233 (100%) 
233/233(100%) 


e-132 


AAM78864 


Human protein SEQ ID NO 1526 - 
Homo sapiens, 567 aa. 
[WO200157190-A2, 09-AUG-2001] 


1..351 
1..348 


209/351 (59%) j 
265/351 (74%) | 


e-118 


ABB11817 


Human KIAA0823 protein 
homologue, SEQ ID NO:2187 - 
Homo sapiens, 536 aa. 
[WO200157188-A2, 09-AUG-2001] 


45..354 
3..318 


173/316(54%) I 
226/316(70%) ! 


3e-94 


AAM79848 


Human protein SEQ ID NO 3494 - 
Homo sapiens, 536 aa. 
[WO200157190-A2, 09-AUG-2001] 


45..3S4 
3..318 


173/316(54%) i 
226/316(70%) j 


3e-94 



In a BLAST search of public sequence databases, the NOV27a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 27D. 



Table 27D. Public BLASTP Results for NOV27a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV27a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96I34 


UNKNOWN (PROTEIN FOR 
MGC: 14333) - Homo sapiens 
(Human), 528 aa. 


1..351 
1..351 


351/351 (100%) 
351/351 (100%) 


0.0 


Q923M0 


MYOSIN PHOSPHATASE 
TARGETING SUBUNIT 3 MYPT3 
- Mus musculus (Mouse), 524 aa 
(fragment). 


1..351 
1..351 


301/351 (85%) ! 
320/351 (90%) j 


e-171 


AAL62093 


PROTEIN PHOSPHATASE 1 
REGULATORY SUBUNIT 16B - 
Mus musculus (Mouse), 568 aa. 


1..351 
1..348 


210/351 (59%) 
266/351 (74%) \ 


e-118 


Q95N27 








e-118 
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Bos taurus (Bovine), 568 aa. 


1..348 


266/351 (74%) j 


Q96T49 


CAAX BOX PROTEIN TIMAP - 
Homo sapiens (Human), 567 aa. 


1..351 
1..348 


209/351 (59%) e-117 
265/351 (74%) j 



PFam analysis predicts that the NOV27a protein contains the domains shown in the 
Table 27E. 



Table 27E. Domain Analysis of NOV27a 


Pfam Domain 


NOV27a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


ank: domain 1 of 5 


70.. 102 


8/33 (24%) 
20/33 (61%) 


99 


ank: domain 2 of 5 


103.135 


16/33(48%) 
26/33 (79%) 


7.1e-08 


ank: domain 3 of 5 


136..168 


15/33 (45%) 
26/33 (79%) 


2.9e-07 


ank: domain 4 of 5 


231.. 263 


16/33 (48%) 
24/33 (73%) 


2e-06 


ank: domain 5 of 5 


264..296 


16/33 (48%) 
27/33 (82%) 


2.7e-08 



Example 28. 

The NOV28 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 28A. 



Table 28A. NOV28 Sequence Analysis 




SEQIDN0:97 


1719 bp 


NOV28a, 

CG59348-01 DNA Sequence 


CGGGCACAGGCTCACCCTCG AGTGG CACAGGAATCCC AGGTAG ATGACGGCGG CCGCG 


GCTGGTGCTGCAGGGTCGGCAGCTCCCGCGGCAGCGGCCGGCGCCCCGGGATCTGGGG 
GCGCACCCTCAGGGTCX^CAGGGGGTGCTGATCGGGGACAGGCTGTACTCCGGGGTGCT 
CATCACCTTGGAGAACTGCCTCCTGCCTGACGACAAGCTCCGTTTCACGCCGTCCATG 
TCG AG CGGCCTCGACAC CGACACAGAG AC CG ACCTCCGCGTGGTGGGCTGCG AGCTC A 
TCCAGGCGGCCGGTATCCTGCTCCGCCTGCCGCAGGTGGCCATGGCTACCGGGCAGGT 
GTTGTTCCAGCGGTTCTTTTATACCAAGTCCTTCGTGAAGCACTCCATGGAGCATGTG 
TCAATGGCCTGTGTCC ACCTGG CTTCCAAGATAGAAG AGG C CCCAAGACG CAT ACGGG 
ACGTCATCAATGTGTTTCACCGCCTTCGACAGCTGAGAGACAAAAAGAAGCCCGTGCC 
TCTACTACTGGATCAAGATTATGTTAATTTAAAGAACCAAATTATAAAGGCGGAAAGA 
CG AGTTCTCAAAG AGTTGGGTTTCTGCGTCCATGTG AAG CATCCTCATAAG ATAAT CG 
TTATGTACCTTCAGGTGTTAGAGTGTGAGCGTAACCAACACCTGGTCCAGACCTCATG 
GAATTACATGAACGACAGCCTTCGCACCGACGTCTTCGTGCGGTTCCAGCCAGAGAGC 
ATCGCCTGTGCCTGCATTTATCTTGCTGCCCGGACGCTGGAGATCCCTTTGCCCAATC 
GTCCCCATTGGTTTCTTTTGTTTGGAGCAACTGAAGAAGAAATTCAGGAAATCTGCTT 
AAAGATCTTGCAGCTTTATGCTCGGAAAAAGGTTGATCTCACACACCTGGAGGGTGAA 
GTGGAAAAAAGAAAGCACGCTATCGAAGAGGCAAAGGCCCAAGCCCGGGGCCTGTTGC 
CTGGGGGCACACAGGTGCTGGATGGTACCTCGGGGTTCTCTCCTGCCCCCAAGCTGGT 
GG AATCCCCCAAAG AAGGTAAAGGG AG CAAG CCTTC CCC ACTGTCTGTGAAG AACACC 
AAGAGGAGGCTGGAGGGCGCCAAGAAAGCCAAGGCGGACAGCCCCGTGAACGGCTTGC 
CAAAGGGGCG AGAGAGT CGG AGTCGGAG CCGG AGCCGTG AGCAG AGCTACTCG AGGTC 
CCCATCCCGATCAGCGTCTCCTAAGAGGAGGAAAAGTGACAGCGGCTCCACATCTGGT 
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GGGTCCAAGTCGCAGAGCCGCTCCCGGAGCAGGAGTGACTCCCCACCGAGACAGGCCC 
CCCGCAGCGCTCCCTACAAAGGCTCTGAGATTCGGGGCTCCCGGAAGTCCAAGGACTG 
CAAGTACC CCCAG AAGCCACACAAGTCTCGGAGCCGG AGTTCTTCCCGTTCTCG AAG C 
AGGTCACGGGAGCGGGCGGATAATCCGGGAAAATACAAGAAGAAAAGTCATTACTACA 
GAGATCAGCGACGAGAGCGCTCGAGGTCGTATGAACGCACAGGCCGTCGCTATGAGCG 
GG AC CACCCTGG G CACAGCAGG CAT CGGAGG T GAC A CGTG CTT CAG AC CGGT CTGGGG 
TGCGG CGCACACCTGGGCCCGTG CAGGG CTCAGCTCGG CAGCAG CTCTG AGGGCAG CT 


CAATGAAAAAGTGAATGCACACGCCCTTGTTGGCGTG 




ORF Start: ATG at 44 


ORF Stop: TGAat 1598 




SEQ ID NO: 98 


518 aa MW at 58034.5kD 


NOV28a, 

CG59348-01 Protein Sequence 


MTAAAAGAAGSAAPAAAAGAPGSGGAPSGSQGVLIGDRLYSGVLITLENCLLPDDKLR 
FTPSMSSGLDTDTETDLRWGCELIQAAGILLRLPQVAMATGQVLFQRFFYTKSFVKH 
S ME HVSMACVHLAS KI E EAPRR I RD VI NVFHRLRQLRD KKKP VP LLLDQD YVNLKNQ I 
I KAE RR VLKE LG FCVHVKH PHKI I VMYLQ VLEC ERNQH LVQTS WNYMNDS LRTDVFVR 
FQ P E S I ACAC I YLAARTL E I PL PNR PHWFLLFG ATE E E I QE I CL K I LQL YARKKVDLT 
HLEGEVEKRKHAI EEAKAQARGLLPGGTQVLDGTSGFS PAPKLVES PKEGKGSKPS PL 
S VKNTKRR LEGAKKAKADS P VNGL P KGRE S RSRS RS REQ S YS RS PS RS AS PKRRKSDS 
GSTSGGSKSQSRSRSRSDSPPRQAPRSAPYKGSEIRGSRKSKDCKYPQKPHKSRSRSS 
SRS R S RS R ERADN PG KYKKKS HYYRDQRRERS RS YERTGRR YERDH PGHS RHRR 



Further analysis of the NOV28a protein yielded the following properties shown in 
Table 28B. 



Table 28B. Protein Sequence Properties NOV28a 


Psort 
analysis: 


0.5500 probability located in endoplasmic reticulum (membrane); 0.2400 
probability located in nucleus; 0.1900 probability located in lysosome (lumen); 
0. 1 000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV28a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 28C. 



Table 28C. Geneseq Results for NOV28a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV28a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM94028 


Human stomach cancer expressed 
polypeptide SEQ ID NO 126 - Homo 
sapiens, 298 aa. [WO200109317-A1, 
08-FEB-2001] 


221..518 
1..298 


298/298 (100%) 
298/298 (100%) 


e-172 


AAG64403 


Human paneth cell enhanced 
expression-like protein - Homo 
sapiens, 298 aa. [WO200138372-A1, 
31-MAY-2001] 


221..518 
1..298 


298/298 (100%) 
298/298 (100%) 


e-172 
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AAB94641 


Human protein sequence SEQ ID 
NO: 15526 - Homo sapiens, 298 aa. 
[EP1074617-A2, 07-FEB-2001] 


221..518 
1..298 


298/298(100%) 
298/298(100%) 


e-172 


AAM78533 


Human protein SEQ ID NO 1 195 - 
Homo sapiens, 526 aa. 
[WO200157190-A2, 09-AUG-2001] 


2..518 
8..526 


316/526 (60%) 
390/526 (74%) 


e-168 


AAB94371 


Human protein sequence SEQ ID 
NO: 14909 - Homo sapiens, 526 aa. 
[EP1074617-A2, 07-FEB-2001] 


2..518 
8..526 


316/526 (60%) 
390/526 (74%) 


e-168 


In a BLAST search of public sequence databases, the NOV28a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 28D. 


Table 28D. Public BLASTP Results for NOV28a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV28a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for i 
the Matched 
Portion 


Expect 
Value 


Q96S94 


HYPOTHETICAL 58.1 KDA 
PROTEIN - Homo sapiens (Human), 
520 aa. 


3..518 
5..520 


516/516(100%) \ 
516/516(100%) j 


0.0 


Q9JJA7 


BRAIN CDNA, CLONE MNCB-5 1 60, 
SIMILAR TO MUS MUSCULUS 
PANETH CELL ENHANCED 
EXPRESSION PCEEMRNA - Mus 
musculus (Mouse), 518 aa. 


1..518 
1..518 


466/519(89%) j 
482/519(92%) j 


0.0 


Q9UK58 


CYCLIN L ANIA-6A - Homo sapiens 
(Human), 526 aa. 


2..518 i 
8..526 j 


316/526(60%) ! 
390/526(74%) j 


e-167 


Q9R1Q2 


CYCLIN ANIA-6A - Rattus 
norvegicus (Rat), 527 aa. 


2..518 ; 
9..527 j 


312/526(59%) ! 
391/526(74%) 


e-165 


Q9WV44 


CYCLIN ANIA-6A - Mus musculus 
(Mouse), 531 aa. 


3..518 f 
15..531 ! 


314/526(59%) : 
385/526(72%) | 


e-162 



PFam analysis predicts that the NOV28a protein contains the domains shown in the 
Table 28E. 



Table 28E. Domain Analysis of NOV28a 


Pfam Domain 


NOV28a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 
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cyclin: domain 1 of 1 


46..190 


28/163 (17%) 
86/163 (53%) 


0.0022 


Srg: domain 1 of 1 


221. .230 


4/10 (40%) 
10/10(100%) 


6.7 


transcript_fac2: domain 1 of 
1 


235..253 


12/19 (63%) 
15/19 (79%) 


0.86 


cyclin C: domain 1 of 1 


196..311 


22/139(16%) 
65/139 (47%) 


2.6 



Example 29. 

The NOV29 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 29A. 



Table 29A. NOV29 Sequence Analysis 




SEQ ID NO: 99 


1069 bp 


NOV29a, 

CG59245-01 DNA Sequence 


CGGGGCCTGGTCGGCAGCTGGGCCGCCATGGAGTCCACGCTGGGCGCGGGCATCGTGA 
TAGCCGAGGCGCTACAGAACCAGCTAGCCTGGCTGGAGAACGTGTGGCTCTGGATCAC 
CTTTCTGGGCGATCCCAAGATCCTCTTTCTGTTCTACTTCCCCGCGGCCTACTACGCC 
TCCCGCCGTGTGGGCATCGCGGTGCTCTGGATCAGCCTCATCACCGAGTGGCTCAACC 
TCATCTTCAAGTGGTTTCTTTTTGGAGACAGGCCCTTTTGGTGGGTCCATGAGTCTGG 
TTACTACAGCCAGGCTCCAGCCCAGGTTCACCAGTTCCCCTCTTCTTGTGAGACTGGT 
CCAGGTGGCAGCCCTTCTGGACACTGCATG ATCACAGG AG C AG CCCT CTGGCCCATAA 
TGACGGCCCTGTCTTCGCAGGTGCGCTGGGTAAGGGTGATGCCTAGCCTGGCTTATTG 
CACCTTCCTTTTGGCGGTTGGCTTGTCGCGAATCTTCATCTTAGCACATTTCCCTCAC 
CAGGTGCTGGCTGGCCTAATAACTGGTTGGCTGATGACTCCCCGAGTGCCTATGGAGC 
GGGAGCTAAGCTTCTATGGGTTGACTGCACTGGCCCTCATGCTAGGCACCAGCCTCAT 
CTATTGGACCCTCTTTACACTGGGCCTGGATCTTTCTTGGTCCATCAGCCTAGCCTTC 
AAGTGGTGTGAGCGGCCTGAGTGGATACACGTGGATAGCCGGCCCTTTGCCTCCCTGA 
GCCGTGACTCAGGGGCTGCCCTGGGCCTGGGCATTGCCTTGCACTCTCCCTGCTATGC 
CCAGGTG CGTCGGGCACAG CTGGG AAATGG C C AG AAGAT AGCCTGCCTTGTGCTGGCC 
ATGGGGCTGCTGGGCCCCCTGGACTGGCTGGGCCACCCCCCTCAGATCAGCCTCTTCT 
ACATTTTCAATTTCCTCAAGTACACCCTCTGGCCATGCCTAGTCCTGGCCCTCGTGCC 
CTGGG CAGTGCACATGTTCAGTGCCCAGGAAGCACCGCCCATC CACTCTTCCTGACTT 
CTTGTGTGCCTCCCTTTCCTTTCCC 




ORF Start: ATG at 28 


ORF Stop:TGAat 1039 




SEQ ID NO: 100 


337 aa MW at 37808.0kD 


NOV29a, 

CG59245-01 Protein Sequence 


MESTLGAGIVIAEALQNQLAWLENVWLWITFLGDPKILFLFYFPAAYYASRRVGIAVL 
WISLITEWLNLIFKWFLFGDRPFWWVHESGYYSQAPAQVHQFPSSCETGPGGSPSGHC 
M I TGAALWP I MTALS SQVRWVRVMPSLAYCTFLLAVGLSRI FI LAHF PHQVLAGLI TG 
WLMTPRVPMERELSFYGLTALALMLGTS LI YWTLFTLGLDLSWS I SLAFKWCERPEWI 
HVDSRPFAS LSRDSGAALGLG I ALHS PCYAQVRRAQLGNGQKI ACLVLAMGLLGPLDW 
LGHPPQI SLFYI FNFLKYTLWPCLVLALVPWAVHMFSAQEAPPIHSS 




SEQ ID NO: 101 


1386 bp 


NOV29b, 

CG59245-02 DNA Sequence 


TGAGTCTGTACTTTCCGCCCTGGAGCAAGCCGGGGCCTGGTCGGCAGCTGGGCCGCCA 


TOG AGTCC ACGCTGGGCG CGGG CATCGTGAT AGCCGAGGCGCTACAG AACCAG CT AGC 
CTGG CTGGAGAACGTGTGGCTCTGG ATC ACCTTTCTGGG CG ATCCCAAGATCCTCTTT 
CTGTTCTACTTCCCCGCGGCCTACTACGCCTCCCGCCGTGTGGGCATCGCGGTGCTCT 
GGATCAGCCTCATCACCGAGTGGCTCAACCTCATCTTCAAGTGGTTTCTTTTTGGAGA 
CAGGCCCTTTTGGTGGGTCCATGAGTCTGGTTACTACAGCCAGGCTCCAGCCCAGGTT 
CACCAGTTCCCCTCTTCTTGTGAGACTGGTCCAGG CAGCCCTTCTGG ACACTG CATGA 
TCACAGGAGCAGCCCTCTGGCCCATAATGACGGCCCTGTCTTCGCAGGTGGCCACTCG 
GGCCCGCAGCCGCTGGGTAAGGGTGATGCCTAGCCTGGCTTATTGCACCTTCCTTTTG 
GCGGTTGGCTTGTCGCGAATCTTCATCTTAGCACATTTCCCTCACCAGGTGCTGGCTG 
GCCTAATAACTGGCGCTGTCCTGGGCTGGCTGATGACTCCCCGAGTGCCTATGGAGCG 
GGAGCTAAGCTTCTATGGGTTG ACTGCACTGG CCCTC ATGCTAGG CACCAGCCTCATC 
TATTGGACCCTCTTTAC ACTGGGCCTGGATCTTTCTTGGTCCATCAG CCTAG CCTTCA 
AGTGGTGTGAGCGGCCTGAGTGGATACACGTGGATAGCCGGCCCTTTGCCTCCCTGAG 
CCGTG ACT CAGGGG CTG CC CTGGG C CTGGG C ATTG CCTTG CA CTCTC CCTG CT ATG CC 
CAGGTGCGTCGGGCACAGCTGGGAAATGGCCAGAAGATAGCCTGCCTTGTGCTGGCCA 
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TGGGGCTGCTGGGCCCCCTGGACTGGCTGGGCCACCCCCCTCAGATCAGCCTCTTCTA 
CATTTTCAATTTCCTCAAGTACACCCTCTGGCCATGCCCAGTCCTGGCCCTCGTGCCC 
TGGGCAGTGCACATGTTCAGTGCCCAGGAAGCACCGCCCATCCACTCTTCCTGACTTC 
TTGTGTGCCTCCCTTTCCTTTCCCTCCCACAAAGCCAACACTCTGTGACCACCACACT 


CCAGGAGGCAGCCCCATCCCCTTCCAGCCCCTAAGTAGGCCCTCCCCTCCCTAAATCT 


GCTTCCGCACCACCTGGTCTTAGCCCCAAAGATGGGCCTTCTCTCTCCCAGATAAGTT 


GGTCCTCCCTCTGCCITTCCTCTCAAGCCCCCAAAGAGCAAAGGCAACAGCAAGACCA 


GCGGGTTCTTGCAACACTGTGAGGGGCAGCCAGGGCGGAAAGTACAGACTCA. 




ORF Start: ATG at 58 


ORF Stop: TGAat 1096 




SEQIDNO: 102 


346 aa MW at 38718.0kD 


NOV29b, 

CG59245-02 Protein Sequence 


MESTLGAGI VI AEALQNQLAWLENVWLWITFLGDPKI LFLFYF PAAVYASRRVG I AVL 
WISLITEWLNLIFKWFLFGDRPFWWVHESGYYSQAPAQVHQFPSSCETGPGSPSGHCM 
ITGAALWPIMTALSSQVATRARSRWVRVMPSLAYCTFLLAVGLSRIFILAHFPHQVIiA 
GLITGAVLGWLMTPRVPMERELSFYGLTAUUjT^ 

KWCE RP E W IH VDS R P FAS LS RD SG AALG LG I ALHS P CY AQ VRRAQI£NGQKI ACL VLA 
MGLLGPLDWIiGHPPQIS LF YI FNFLKYTLWPCP VLALVPWAVHMFSAQEAPP I HSS 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 29B. 



Table 29B. Comparison of NOV29a against NOV29b. 


Protein Sequence 


NOV29a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV29b 


1..337 
1..346 


335/347 (96%) 
335/347 (96%) 



Further analysis of the NOV29a protein yielded the following properties shown in 
Table 29C. 



Table 29C. Protein Sequence Properties NOV29a 


PSort 
analysis: 


0.6850 probability located in endoplasmic reticulum (membrane); 0.6400 
probability located in plasma membrane; 0.4600 probability located in Golgi 
body; 0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Likely cleavage site between residues 41 and 42 



A search of the NOV29a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 29D. 
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Table 29D. Geneseq Results for NOV29a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV29a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM79500 


Human protein SEQ ID NO 3 146 - 
Homo sapiens, 382 aa. 
[WO200157190-A2, 09-AUG-2001] 


1..337 
37..382 


336/347 (96%) 
336/347 (96%) 


0.0 


AAB42637 


Human ORFX ORF2401 polypeptide 
sequence SEQ ID NO:4802 - Homo 
sapiens, 377 aa. [WO200058473-A2, 
05-OCT-2000] 


1..337 
31..377 


328/348 (94%) 
328/348 (94%) 


0.0 


AAB85355 


Human phosphatase (PP) (clone ID 
1269556CD1) - Homo sapiens, 385 
aa. [WO200153469-A2, 26-JUL- 
2001] 


1..305 ; 
1..314 


297/315 (94%) 
298/315(94%) 


e-174 


AAM78516 


Human protein SEQ ID NO 1 1 78 - 

Homo sapiens, 404 aa. 

[WO2001 57190-A2, 09-AUG-2001 ] 


1..337 ; 
125..404 j 


266/341 (78%) 
272/341 (79%) 


e-146 


AAB25679 


Human secreted protein sequence 
encoded by gene 15 SEQ ID NO:68 - 
Homo sapiens, 141 aa. 
[WO200043495-A2, 27-JUL-2000] 


198..337 
1..140 


140/140(100%) 
140/140 (100%) 


6e-81 



In a BLAST search of public sequence databases, the NOV29a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 29E. 



Table 29E. Public BLASTP Results for NOV29a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV29a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


AAH21574 


HYPOTHETICAL 38.7 KDA 
PROTEIN - Homo sapiens 
(Human), 346 aa. 


1..337 
1..346 


336/347 (96%) 
336/347 (96%) 


0.0 


Q9BUM1 


HYPOTHETICAL 40.1 KDA 
PROTEIN - Homo sapiens 
(Human), 360 aa (fragment). 


1..337 
15..360 


336/347 (96%) 
336/347 (96%) 


0.0 


042153 


Glucose-6-phosphatase (EC 3.1.3.9) 
(G6Pase) (G-6-Pase) - 
Haplochromis nubilus, 352 aa. 


8..323 
8..339 


127/333 (38%) 
184/333 (55%) 


le-59 
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Q98UF8 


GLUCOSE-6-PHOSPHATASE - 
Spams aurata (Gilthead sea bream), 
350 aa. 


8..323 
8..337 


123/333 (36%) 
185/333 (54%) 


2e-57 


Q9Z186 


GLUCOSE-6-PHOSPHATASE - 
Mus musculus (Mouse), 355 aa. 


7..325 
7..345 


128/343 (37%) 
188/343 (54%) 


5e-56 



PFam analysis predicts that the NOV29a protein contains the domains shown in the 
Table 29F. 



Table 29F. Domain Analysis of NOV29a 


Pfam Domain 


NOV29a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


PAP2: domain 1 of 1 


51..190 


38/175 (22%) 
95/175 (54%) 


0.00037 



Example 30. 

The NOV30 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 30A. 



Table 30A. NOV30 Sequence Analysis 




SEQIDNO: 103 


1624 bp 


NOV30a, 

CG59241-01 DNA Sequence 


ATGGAACTGAAGGCCGAGGAGGAGGAGGTGGGTGGCGTCCAGCCGGTGGACTTGGTGG 
CCTTTGCCAACAGCTGCACCCTCCATGGCACCAACCACATTTTTGTGGAGGGGGGTCC 
AGGGCCAAGGCAGGTGCTGTGGGCGGTGGCCTTTGTCCTGGCACTGGGTGCCTTCCTG 
TGCCAGGTAGGGGACCGCGTTGCTTATTACCTCAGCTACCCACACGTGACCCTTCTAA 
ACGAAGTGGCCACCACGGAGCTGGCCTTCCCGGCAGTCACCCTCTGCAACACTAATGC 
TGTGCGGCTGTC C CAGCTCAGCT ACCCTGACTTGCTTT ATTTGG CCCC CATG CTGGG A 
CTGGATGAAAGTGATGACCCCGGGGTGCCCCTCGCTCCACCGGGCCCTGAGGCCTTCT 
CTGGGGAGCCCTTTAACCTGCACCGCTTCTACAATCGCTCCTGCCACCGGCTGGAGGA 
CATGCTGCTCTATTGCTCCTACCAAGGGGGACCCTGCGGCCCTCACAACTTCTCAGTG 
GTGTTCACACGCTATGGAAAGTG CTACACGTT CAACTCGGGC CG AGATGGGCGGCCGC 
GGCTGAAGACCATGAAGGGTGGGACGGGCAATGGGCTGGAAATCATGCTGGACATCCA 
GCAGGACGAGTACCTGCCTGTGTGGGGGGAGACTGACGAGACGTCCTTCGAAGCAGGC 
ATCAAAGTG CAG AT C CAT AG T C AGGATGAAC CT CCTTT C AT CGAC CAG CTGGG CTTTG 
GCGTGGCCCCAGGCTTCCAGACCTTTGTGGCCTGCCAGGAGCAGCGGATCTACCTGCC 
CCCACCCTGGGGCACCTGCAAAGCTGTTACCATGGACTCGGATTTCTTCGACTCCTAC 
AGCATCACTGCCTGCCGCATCGACTGTGAGACGCGCTACCTGGTGGAGAACTGCAACT 
G CCGCATGGTG CACATGCCAGGTGATGCCCCAT ACTGT ACTCCAGAG CAGT AC AAGG A 
GTGTGCAGATCCTGCTCTGGACTTCCTGGTGGAGAAGGACCAGGAGTACTGCGTGTGT 
GAAATGCCTTGCAACCTGACCCGCTATGGCAAAGAGCTGTCCATGGTCAAGATCCCCA 
GCAAAGCCTCAG CCAAGTACCTGG CCAAGAAGTTCAACAAATCTGAGCAATACATAGG 
GGAGAACATCCTGGTGCTGGACATTTTCTTTGAAGTCCTCAACTATGAGACCATTGAA 
CAGAAGAAGGCCTATGAGATTGCAGGGCTCCTGGGTGACATCGGGGGCCAGATGGGGC 
TCTTCATCGGGGCCAGCATCCTCACGGTGCTGGAGCTCTTTGACTACGCCTACGAGGT 
AGTCATTAAGCACAAGCTGTGCCGACGAGGAAAATGCCAGAAGGAGGCCAAAAGGAGC 
AGTGCGGACAAGGGCGTGGCCCTCAGCCTGGACGACGTCAAAAGACACAACCCGTGCG 
AGAGCCTTCGGGGCCACCCTGCCGGGATGACATACGCTGCCAACATCCTACCTCACCA 
TCCGGCCCGAGGCACGTTCGAGGACTTTACCTGCTGAGCCCCGCAGGCCGCTGAACCA 
AAGGCCTAG ATGGGG AGGACTAGGAG AGCG AGGGGGCCC CCAG CTG CCTCCTCACATC 




ORP Start: ATG at 1 


ORF Stop: TGAat 1543 




SEQIDNO: 104 


514aa 


MWat57221.7kD 


NOV30a, 


ME LKAE E E E VGGVQP VD LVAFANSCTLHGTNH I FVEGG PG PRQVLWAVAFVLALG AF L 
CQVGDRVAYYLSYPHWLLNEVATTELAFPAVTLCNTNAVRLSQLSYPDLLYLAPMLG 
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CG59241-01 Protein Sequence 



LDESDDPGVPLAPPGPEAFSGEPFNLHRFYNRSCHRLEDMIiLYCSYQGGPCGPHNFSV 
VFTRYGKCYTFNSGRDGRPRLKTMKGGTGNGLEIMLDIQQDEYLPVWGETDETSFEAG 
IKVQIHSQDEPPFIDQLGFGVAPGFQTFVACQEQRIYLPPPWGTCKAVTMDSDFFDSY 
SITACRIDCETRYLVENCNCRMVHMPGDAPYCTPEQYKECADPALDFLVEKDQEYCVC 
EMPCNLTRYGKELSMVKI PSKASAKYLAKKFNKSEQYIGENI LVLDI FFEVLNYETI E 
QKKAYE I AGLLGDIGGQMGLFIGAS I LTVLELFD YAYEWIKHKLCRRGKCQKEAKRS 
S ADKG VALSLDD VKRHN P CE S LRGH PAGMT YAAN I L PHH PARGT FEDFTC 



Further analysis of the NOV30a protein yielded the following properties shown in 
Table 30B. 



Table 30B. Protein Sequence Properties NOV30a 


PSort 
analysis: 


0.7900 probability located in plasma membrane; 0.3000 probability located in 
Golgi body; 0.2000 probability located in endoplasmic reticulum (membrane); 
0. 1 000 probability located in mitochondrial inner membrane 


SignalP 
analysis: 


Likely cleavage site between residues 60 and 61 



A search of the NOV30a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 30C. 



Table 30C. Geneseq Results for NOV30a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV30a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAY69178 


A rat acid-sensitive cationic channel 
IB (rASIClB) - Rattus sp, 559 aa. 
[WO200008149-A2, 17-FEB-2000] 


1..514 
47..559 


488/515 (94%) 
497/515(95%) 


0.0 


AAY03186 


Rat Acid sensitive ion channel 
protein sequence - Rattus sp, 513 aa. 
[W09911784-A1, ll-MAR-1999] 


1..514 
1..513 


488/515(94%) 
498/515 (95%) 


0.0 


AAW68507 


Rat acid sensing ionic channel IB - 
Rattus sp, 559 aa. [WO9835034-A1, 
13-AUG-1998] 


1..514 
47..559 


488/515 (94%) 
497/515 (95%) 


0.0 


AAY69175 


A rat acid-sensitive cationic channel 
1 A (rASICIA) - Rattus sp, 526 aa. 
[WO200008149-A2, 17-FEB-2000] 


1..514 
1..526 


416/527 (78%) 
445/527 (83%) 


0.0 


AAY03188 


Rat Acid sensitive ion channel alpha 
protein sequence - Rattus sp, 526 aa. 
[W09911784-A1, ll-MAR-1999] 


1..514 
1..526 


416/527 (78%) 
445/527 (83%) 


0.0 
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In a BLAST search of public sequence databases, the NOV30a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 30D. 



Table 30D. Public BLASTP Results for NOV30a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV30a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q91YB8 


ION CHANNEL - Rattus norvegicus 
(Rat), 559 aa. 


1..514 
47..5S9 


489/515(94%) 
498/515 (95%) 


0.0 


088762 


ASIC-BETA - Rattus norvegicus (Rat), 

S1 1 an 


L.514 
i ..j i j 


488/515(94%) 

4yoOlj [yj /o) 


0.0 


P55926 


Amiloride-sensitive brain sodium 
channel BNaC2 (Amiloride-sensitive 
cation channel neuronal 2"! (Proton 
gated cation channel ASIC1) - Rattus 
norvegicus (Rat), 526 aa. 


1..514 
1..526 


416/527 (78%) 
445/527 (83%) 


0.0 


P78348 


Amiloride-sensitive brain sodium 
channel BNaC2 (Amiloride-sensitive 
cation channel neuronal 2) - Homo 
sapiens (Human), 574 aa. 


1..514 
1..574 


421/575 (73%) 
447/575 (77%) 


0.0 


Q99NA1 


PROTON-GATED CATION 
CHANNEL SUBUNIT ASIC-BETA2 - 
Rattus norvegicus (Rat), 425 aa. 


175..514 
86..425 


334/341 (97%) 
337/341 (97%) 


0.0 



PFam analysis predicts that the NOV30a protein contains the domains shown in the 
Table 30E. 
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Table 30E. Domain Analysis of NOV30a 


Pfam Domain 


NOV30a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


ASC: domain 1 of 2 


21..118 


34/106(32%) 
79/106 (75%) 


1.6e-29 


ASC: domain 2 of 2 


1 45..442 


133/351 (38%) 
281/351 (80%) 


2.1e-139 



Example 31. 

The NOV31 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 31 A. 



Table 31A. NOV31 Sequence Analysis 




SEQ ID NO: 105 


1949 bp 


NOV31a, 

CG58602-01 DNA Sequence 


TGCCTGGCTATGGCCCGACTCCTCAGGTCTGCAACCTGGGAGCTCTTCCCCTGGAGGG 
GCTACTGCTCCCAGTCCCTGCAGGGAGAGCTCTGCAGGGACTTCGTAGAGGCTCTGAA 
GGCCGTGGTGGGCGGCTCCCACGTGTCCACTGCCGCGGTGGTCCGAGAGCAGCACGGG 
CGCGATGAGTCGGTGCACAGGTGCGAACCTCCTGATGCTGTGGTGTGGCCCCAGAACG 
TGGAGCAGGTCAGCCGGCTGGCAGCCCTGTGCTATCGCCAAGGTGTX5CCCATCATCCC 
ATTCGGCACCGGCACCGGGCTTGAGGGTGGCGTCTGTGCTGTGCAGGGCGGCGTCTGC 
GT^AACCTGACGCATATGGACCGAATCCTGGAGCTGAACCAGGAGGACTTCrCTGTGG 
TGGTGGAGCCAGGTGTCACCCGCAAAGCCCTCAACGCCCACCTGCGGGACAGCGGCCT 
CTGGTTTCCTCCAGACCCAGGCGCGGACGCCTCTCTCTGTGGCATGGCGGCCACCGGG 
GCGTCGGGGACCAACGCGGTCCGCTACGGCACCATGCGGGACAACGTGCTCAACCTGG 
AGGTGGTGCTGCCCGACGGGCGGCTGCTGCACACGGCGGGCCGAGGCCTCATCACAGA 
TTCCACTGCTGCATTCCCCCACATCAGCCCCACTGAGTGCTTTTCCCAGGGGCCAGGG 
CCTCATGTCAATTCTCCTCACCCTGCCCCTGAGGCCACAGTGGCCGCCACGTGTGCGT 
TCCCCAGTCTCCAGGCTGCTGTGGACAGCACTGTACACATCCTCCAGGCTGCAGTGCC 
CGTAGCCCGCATTGAGTTCCTGGATGAAGTCATGATGGATGCCTGCAACAGGTACAGC 
AAGCTGAATTGCTTAGTGGCGCCCACACTCTTCCTGGAGTTCCATGGCTCCCAGCAGG 
CACTOGAGGAGCAGCTGCAGCGCACAGAGGAGATAGTCCAGCAGAACGGAGCCTCTGA 
CTTCTCCTGGG CCAAGG AGGC C G AG G AG CG C AG C CGG CTTTGG AC AG CACGG CACAAT 
GCCTGGTACGCAGCCCTGGCCACGCGGCCAGGCTGCAAGGGCTACTCCACGGATGTGT 
GTGTG CCCATCTCCCGGCTGCCGGAGATCGTGGTGCAGAC C AAGGAGGATCTGAATGC 
CTCAGGACTCACAGGAAGCATTGTCGGGCATCTGGGTGACGGCAACTTCCACTGCATC 
CTGCTGGTCAACC CTGATGACGCCG AGGAACTGGG CAGGGTCAAGGCTTTTGCAGAAC 
AGCTGGGCAGGCGGGCACTGGCTCTCCACGGAACGTGCACGGGGGAGCATGGCATCGG 
AATGGGCAAGCGG CAG CTGCTGCAGGAGGAGGTGGGCGCCGTGGGCGTGG AGACCATG 
CGGCAGCTCAAGGCCGTGCTAGACCC CCAAGG CCTCATGAATCCAGGCAAAGTGCTGT 
OAAGGGGGTCTGAGCACTTAGCCCACAAGTTCCCTGACTACGGAGCCGGTTCTGGAAC 


TTTTCTTCATGCCACGGCCCCTG CAAGGAAAT AGATGCTG AGG CAGTCTTCCTGCCAG 


CGAGCCCACTGTATCTGGGCCCAAGGCCAGAGGGCCCAGAGAGAAGCCTGAGCACCGT 


GTTACCTCCCTGGCCCTCTGGCTGGCCCCAGGAGCCTTTGGTTCAGTAAACGACCCAG 


GGTGGTTCCCAGC AAAGCTGCTT CCTCTCTG CTCCT ACGCATCCTGTCCTGG CGGG AA 


GAGAGCGTCTGGGTCCATTCAAGACTCTGATGACACCCCTCCCCGAGGCCTCCCACTG 


CCGGGGTCCCAGGACCCTTCCCCCTTCACCTGGTGACAGGAACACTCCTTTCCTGGTA 


TGGAACGTGAG CTCCCG TGACATG ATG ATAGGTCTTCTCCTTGGGGCCTCC CCCAAT A 


AATCTGTAATAAACCTGAAACCCACCTACAGCTAA 




ORF Start: ATG at 10 


ORF Stop: TGAat 1450 




SEQ ID NO: 106 


480 aa MW at 51629.1kD 


NOV31a, 

CG58602-01 Protein Sequence 


MARI^RSAT^ELFPWRGYCSQSLQGELCRDFVEALKAWGGSHVSTAAVVREQHGRDE 
S VHRCE PPD AWW PQNVEQ VS RLAALC YRQG V P 1 1 P FGTGTG LEGG VCA VQGG VCVN L 
THMDRI LELNQ ED FS VWE PG VTRKALNAHLRDSGLWF P PDPG AD AS LCGMAATG ASG 
TNAVRYGTMRDNVLNLEWLPDGRLLHTAGRGLITDSTAAFPHISPTECFSQGPGPHV 
NS PHPAPEATVAATCAF PS VQAAVDSTVH I LQAAVPVAR I E FLDEVMMDACNRYSKLN 
CLVAPTLFLEFHGSQQALEEQLQRTEEIVQQNGASDFSWAKEAEERSRLWTARHNAWY 
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AALATRPGCKG YSTDVCVPI SRLPEI WQTKEDLNASGLTGS I VGHVGDGNFHCILLV 
NPDDAEELGRVKAFAEQLGRRALALHGTCTGEHGIGMGKRQLLQEEVGAVGVETMRQL 
KAVLDPQG LMNPGKVL 



Further analysis of the NOV3 la protein yielded the following properties shown in 
Table 31B. 



Table 31B. Protein Sequence Properties NOV31a 


PSort 
analysis: 


0.6574 probability located in mitochondrial matrix space; 0.3502 probability 
located in mitochondrial inner membrane; 0.3502 probability located in 
mitochondrial intermembrane space; 0.3502 probability located in mitochondrial 
outer membrane 


SignalP 
analysis: 


Likely cleavage site between residues 20 and 2 1 



A search of the NOV31a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 31C. 



Table 31 C. Geneseq Results for NOV31 a 


Geneseq 
Identifier j 


Protein/Organism/Length [Patent #, 
Date] 


NOV31a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


ABB10446 j 


Human cDNA SEQ ID NO: 754 - 
Homo sapiens, 1 15 aa. 
[WO200154474-A2, 02-AUG-2001] 


1..96 
15..110 


91/96 (94%) 
92/96 (95%) 


8e-49 


AAE09597 


Human gene 5 encoded novel protein 
HDPMT22, SEQ ID NO:33 - Homo 
sapiens, 115 aa. [WO200155311-A2, 
02-AUG-2001] 


1..96 
15..110 


91/96 (94%) 
92/96 (95%) 


8e-49 


AAM52368 


GIP12-C4 protein - Arabidopsis 
thaliana, 159 aa. [FR2806095-A1, 14- 
SEP-2001] 


66..203 
3..140 


69/138 (50%) 
98/138 (71%) 


9e-34 


AAG92286 


C glutamicum protein fragment SEQ 
ID NO: 6040 - Corynebacterium 
glutamicum, 948 aa. [EP 11 08790- A2, 
20-JUN-2001] 


46..477 
25..502 


108/486 (22%) 
186/486 (38%) 


2e-22 


AAB79309 \ 


Corynebacterium glutamicum SMP 
protein sequence SEQ ID NO: 134 - 
Corynebacterium glutamicum, 945 aa. 
[WO2001 00844- A2, 04-JAN-2001] 


46..477 
22..499 


108/486 (22%) 
186/486(38%) 


2e-22 
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In a BLAST search of public sequence databases, the NOV3 la protein was found to 
have homology to the proteins shown in the BLASTP data in Table 3 ID. 



Table 31D. Public BLASTP Results for NOV31a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV31a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9D635 


4733401P21RIK PROTEIN - Mus 
musculus (Mouse), 481 aa. 


1 480 
1..481 


394/483 fR1%1 
423/483 (87%) 


0 0 

V7.VJ 


Q 19965 


F32D8.4 PROTEIN - Caenorhabditis 
elegans, 912 aa. 


20 480 
445..909 


221/466 (41%\ 
307/466(65%) 


e-121 


CAD16371 


PUTATIVE D-LACTATE 

DEHYDROGENASE 

(CYTOCHROME) 

OXIDOREDUCTASE PROTEIN (EC 
1 . 1 .2.4) - Ralstonia solanacearura 
(Pseudomonas solanacearum), 472 aa. 


32..479 
20..469 


226/454 (49%) 
300/454 (65%) 


e-119 


A89201 


protein F32D8.4 [imported] - 
Caenorhabditis elegans, 870 aa. 


30..480 
399..867 


214/469 (45%) 
296/469 (62%) 


e-115 


AAL51780 


D-LACTATE DEHYDROGENASE 
(CYTOCHROME) (EC 1 . 1 .2.4) - 
Brucella melitensis, 468 aa. 


41. .480 
28..467 


209/444 (47%) 
286/444 (64%) 


e-114 



PFam analysis predicts that the NOV3 la protein contains the domains shown in the 
Table 3 IE. 



Table 31E. Domain Analysis of NO V3 la 


Pfam Domain 


NOV31a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


FAD binding 4: domain 1 of 
1 


33..214 


70/208 (34%) 
154/208 (74%) 


3.7e-56 


FAD-oxidase C: domain 1 of 
1 


206..479 


91/307 (30%) 
210/307 (68%) 


1.3e-58 



Example 32. 



The NOV32 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 32A. 
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Table 32A. NOV32 Sequence Analysis 




SEQIDNO: 107 


698 bp 


NOV32a, 

CG58468-01 DNA Sequence 


CTCCTTCCTCTGCTCTTTATATGGACCAACAACTTCTCGTCCTTGGGGTCTCTGTGCA 
AATATCAATTTTTTCACATTATCTTTTCTCCACAGACATGAGAGGGAAGGCATTTATT 
TTCCCTCAAGAATCAGCTACAGTCTATGTGTCCCTGATCCCCAAGGTGAAGAAGCCCC 
TGAAGAACTTCAAGCTTTGCCTGAAAACCTTCACAGACTTCACCTGCCCTTATAGCCT 
CTTCTACAGCACTCGGTCCCAGGACAATGAGCTGCTTCTCCTTGTCAACAAAATGGGA 
ATGT ATCTG CTG CACATTGG AAATGCTG CGGTCACTTTCAATGGCC CCACCCCCTGCC 
CTCG ATCTCCTTATGCTTCG ACCCATGT CAATGTG AGCTGGGAGTCTG CCTCTGG AAT 
TGCTACACTCTGGGCAAATGGGAAGCTGGTGGGGAGGAAGGGTGTGTGGAAGGGGTAC 
TCTGTGGGAGAAGAGGCTAAGATCATCCTGGGACAAGAGCAGGATTCCTTTGGGGGAC 
ATTTTGATGAAAATCAATCCTTTGTTGGGGTGATATGGGATGTGTTTTTGTGGGATCA 
TGTGCTCCCTCCAAAGGAGATGTGTGACTCCTGTTACAGCGGCAGCCTCCTGAATCGG 
CATACCCTGACTTATGAAGATAATGGCTATGTGGTAACTAAGCCCAAGGTGTGGGCTT 
AA 




ORF Start: ATG at 21 


ORF Stop: TAA at 696 




SEQIDNO: 108 


225 aa MW at 25265.8kD 


NOV32a, 

CG58468-01 Protein Sequence 


l^QQLLVLGVSVQISIFSHYLFSTDMRGKAFIFPQESATVYVSLIPKVKKPLKNFKLC 
LKTFTDFTCPYSLFYSTRSQDNELLLLVNKMGMYLLHIGNAAVTFNGPTPCPRSPYAS 
THVNVSWESASGIATLWANGKLVGRKGVWKGYSVGEEAKIILGQEQDSFGGHFDENQS 
FVGVIWDVFLWDHVLPPKEMCDSCYSGSLLNRHTLTYEDNGYWTKPKVWA 



Further analysis of the NOV32a protein yielded the following properties shown in 
Table 32B. 



Table 32B. Protein Sequence Properties NOV32a 


PSort 
analysis: 


0.5500 probability located in endoplasmic reticulum (membrane); 0.3200 
probability located in microbody (peroxisome); 0.2368 probability located in 
lysosome (lumen); 0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV32a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 32C. 



Table 32C. Geneseq Results for NOV32a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV32a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAR74763 


Sermun amyloid P component, 
promoter sapm - Homo sapiens, 204 
aa. [WO9505394-A, 23-FEB-1995] 


24..224 
2..203 


98/207 (47%) 
136/207(65%) 


4e-48 


AAR29923 


SAP - Homo sapiens, 223 aa. 
[W0922 1 364-A, 1 0-DEC- 1 992] 


7..224 
5..222 


101/224(45%) 
143/224(63%) 


3e-47 
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AAR29922 


CRP - Homo sapiens, 225 aa. 
[W09221364-A, 10-DEC-1992] 


14..224 
11. .224 


100/218(45%) 
132/218(59%) 


2e-43 


AAR74769 


Female hamster protein, 1 flip - 
Cricetus cricetus, 210 aa. 
[WO9505394-A, 23-FEB-1995] 


24..222 
1..199 


95/206 (46%) 
132/206 (63%) 


6e-43 


AAY76844 


Human C reactive protein (CRP) 
sequence - Homo sapiens, 206 aa. 
[JP2000014388-A, 18-JAN-2000] 


24..224 
2..205 


98/208 (47%) 
128/208(61%) 


le-42 



In a BLAST search of public sequence databases, the NOV32a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 32D. 



Table 32D. Public BLASTP Results for NOV32a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV32a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9D8J8 


1810030J14RIK PROTEIN -Mus 
musculus (Mouse), 219 aa. 


6..224 
4..218 


130/220(59%) 
166/220 (75%) 


5e-72 


Q9D8V2 


1810030J14R1K PROTEIN -Mus 1 
musculus (Mouse), 200 aa. 


6.. 190 
20..200 


110/186(59%) 
139/186 (74%) 


2e-58 


Q63913 


SERUM AMYLOID P r Cricetulus 
migratorius (Armenian hamster), 223 
aa. 


1..224 
1..222 


109/231 (47%) 
152/231 (65%) 


4e-51 


P23680 


Serum amyloid P-component 
precursor (SAP) - Rattus norvegicus 
(Rat), 228 aa. 


6..224 
4..223 


105/224(46%) 
145/224 (63%) 


7e-50 


P15697 


Female protein precursor (FP) (Serum 
amyloid P-component) - Cricetulus 
migratorius (Armenian hamster), 23 1 
aa. 


1..222 
1..220 


108/229 (47%) 
151/229 (65%) 


7e-50 



PFam analysis predicts that the NOV32a protein contains the domains shown in the 
Table 32E. 
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Table 32E. Domain Analysis of NOV32a 


Pfam Domain 


NOV32a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


pentaxin: domain 1 of 1 


29..221 


103/214(48%) 
156/214(73%) 


8e-76 



Example 33. 

The NOV33 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 3 3 A. 



Table 33A. NOV33 Sequence Analysis 




SEQIDNO: 109 


3350 bp 


NOV33a, 

CG58 183-01 DNA Sequence 


T^TGAGGAGACTGAGTTTGTXK3TGGCTGCTGAGCAGGGTCTGTCTGCTX5TTGCCGCC 
GCCCTGCGCACTGGTGCTGG CCGGGGTG CCCAGCTCCTCCTCGCACC CGCAG CCCTG C 
CAGATCCTCAAGCGCATCGGGCACGCGGTGAGGGTGGGCGCGGTGCACTTGCAGCCCT 
GGACCACCGCCCCCCGCGCGGCCAGCCGCGCTCCGGACGACAGCCGAGCAGGAGCCCA 
GAGGGATGAGCCGGAGCCAGGGACTAGGCGGTCCCCGGCGCCCTCGCCGGGCGCACGC 
TGGTTGGGGAGCACCCTGCATGGCCGGGGGCCGCCGGGCTCCCGTAAGCCCGGGGAGG 
GCGCCAGGGCGGAGGCCCTGTGGCCACGGGACGCCCTCCTATTTGCCGTGGACAACCT 
GAACCGCGTGGAAGGG CTGCTACC CTACAACCTGTCTT TGGAAGTAGTGATGGCC AT C 
GAGGCAGGCCTGGGCGATCTGCCACTTTTGCCCTTCTCCTCCCCTAGTTCGCCATGGA 
GCAGTX3ACCCTTTCTCCTTCCTGCAAAGTGTGTGCCATACCGTGGTGGTGCAAGGGGT 
GTCGGCGCTGCTCG CCTTCC CCCAG AGCCAGGGCGAAATGATGGAG CTCG ACTTGGTC 
AGCTTAGTCCTGCACATTCCAGTGATCAGCATCGTCCGCC^CGAGTTTCCACGGGAGA 
GT CAGAAT C CCCTT CACCT ACAAC TG AG TTT AG AAAATT CATT AAG TTCTG ATG CTG A 
TGTCACTGTCTCAATCCTGACCATGAACAACTGGTACAATTTTAGCTTGTTGCTGTGC 
CAGGAAGACTGGAACATCACCGACTTCCTCCTCCTTACCCAGAATAATTCCAAGTTCC 
ACCTTGGTTCTATCATCAACATCACCGCTAACCTCCCCTCCACCCAGGACCTCTTGAG 
CTTCCTACAGATCCAGCTTGAGAGTATTAAGAACAGCACACCCACAGTGGTGATGTTT 
GG CTG CGACATGG AAAGTATCCGGCGGATTTTCGAAATT ACAAC CCAGTTTGGGGTCA 
TGCCCCCTGAACTTCGTTGGGTGCTGGGAGATTCCCAGAATGTGGAGGAACTGAGGAC 
AG AGGGTCTGCCCTTAGGGCTCATTG CT CATGGAAAAACAACACAGTCTGTCTTTG AG 
CACTACGTACAAGATGCTATGGAGCTGGTCGCAAGAGCTGTAGCCACAGCCACCATGA 
TCCAACCAGAACTTGCTCTC ATTC CC AGCACG ATG AACTG CATGGAGGTGGAAACTAC 
AAATCTCACTTCAGGACAATATTTATCAAGGTTTCTAGCCAATACCACTTTCAGAGGC 
CTCAGTGGTTCCATCAGAGT AAAAGGTTCCAC CATCGT CAG CTCAGAAAACAACTTTT 
TCATCTGGAATCTTCAACATGACCCCATGGGAAAGCCAATGTGGACCCGCTTGGGCAG 
CTGG CAGGGGGGAAAGATTGTCATGGACT ATGG AATATGG CCAG AGCAGG CCCAGAG A 
CACAAAACCCACTTCCAACATCCAAGTAAGCTACACTTGAGAGTGGTTACCCTGATTG 
AGCATCCTTTTGTCTTCACAAGGGAGGTAGATGATGAAGGCTTGTGCCCTGCTGGCCA 
ACTCTGTCTAGACCCCATGACTAATGACTCTTCCACATTGGACAGCCTTTTTAGCAGC 
CTCC AT AG CAGTAATGATAC AGTG CC CATT AAATTCAAGAAGTGCTG CT ATGG ATATT 
GCATTGATCTGCTGGAAAAGATAGCAGAAGACATGAACTTTGACTTCGACCTCTATAT 
TGTAGGGGATGGAAAGTATGGAGCATGGAAAAATGGGCACTGGACTGGGCTAGTGGGT 
GATCTCCTGAGAGGGACTGCCCACATGGCAGTCACTTCCTTTAGCATCAATACTGCAC 
GGAGCCAGGTGATAGATTTCACCAGCCCTTTCTTCTCCACCAGCTTGGGCATCTTAGT 
GAGGACCCGAGATACAGCAGCTCCCATTGGAGCCTTCATGTGGCCACTCCACTGGACA 
ATGTGG CTGGGG ATTTTTGTG G CT CTG C ACAT C ACTG C CG T CTT C CT CACT C TG T ATG 
AATCGAAGAGTCCATTTGGTTTGACTTCCAAGGGGCGAAATAGAAGTAAAGTCTTCTC 
CTTTTCTTCAGCCTTGAACATCTGTTATGCCCTCTTGTTTGGCAGAACAGTGGCCATC 
AAACCTCCAAAATGTTGGACTGGAAGGTTTCTAATGAACCTTTGGGCCATTTTCTGTA 
TGTTTTGCCTTTCCACATACACGG CAAACTTGG CTGCTGTCATGGTAGGTG AGAAG AT 
CTATGAAGAGCTTTCTGGAATACATGACCCCAAGTTACATCATCCTTCCCAAGGATTC 
CG CTTTGG AACTGT CCG AG AAAG C AG TG CTG AAG ATT AT G TG AG AC AAAGTTTC CC AG 
AGATGCATGAATATATGAGAAGGTACAATGTTCCAGCCACCCCTGATGGAGTGGAGTA 
TCTGAAGAATGATCCAGAGAAACTAGACGCCTTCATCATGGACAAAGCCCTTCTGGAT 
TATGAAGTGTCAATAGATGCTGACTGCAAACTTCTCACTGTGGGGAAGCCATTTGCCA 
TAGAAGGTTACGG CATTGGCCTCCCACC CAACTCTCCATTG ACCG CCAACAT ATCCGA 
GCTAATCAGTCAATACAAGTCACATGGGTTTATGGATATGCTCCATGACAAGTGGTAC 
AGGGTGGTTCCCTGTGGCAAGAGAAGTTTTGCTGTCACGGAGACTTTGCAAATGGGCA 
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TCAAACACTTCTCTGGGCTCTTTGTGCTGCTGTGCATTGGATTTGGTCTGTCCATTTT 
GACCACCATTGGTGAGCACATAGTATACAGGCTGCTGCTACCACGAATCAAAAACAAA 
TCC AAGCTG CAATACTGGCTC CACACCAGCCAGAGATTAC AC AGAGCAAT AAATAC AT 
CATTTATAGAGGAAAAGCAGCAGCATTTCAAGACCAAACGTGTGGAAAAGAGATCTAA 
TGTGGGACCCCGTCAGCTTACCGTATGGAATACTTCCAATCTGAGTCATGACAACCGA 
CGGAAATACATCTTTAGTGATGAGGAAGGACAAAACCAGCTGGGCATCCGGATCCACC 
AGGACATCCCCCTCCCTCCAAGGAGAAGAGAGCTCCCTGCCTTGCGGACCACCAATGG 
GAAAGCAGACTCCCTAAATGTATCTCGGAACTCAGTGATGCAGGAACTCTCAGAGCTC 
G AG AAGCAG ATT CAGGTGATCCGTCAGGAGCTGC AGCTGGCTGTGAG CAGG AAAACGG 
AG CTGGAGGAGT ATCAAAGG ACAAGT CGG ACTTGTGAGTCCTAG 




ORF Start: ATG at 3 


ORF Stop: TAG at 3348 




SEQIDNO: 110 


1115 aa 


MW at 125453.7kD 


NOV33a, 

CG58 183-01 Protein Sequence 


MRRLS LWWLLSRVCLLLPPPCALVLAGVPSSSSHPQPCQ I LKRIGHAVRVGAVHLQPW 
TTAPRAAS RAPDD SRAGAQRD E P E PGTRRS PA PS PG ARWLGS T LHGRGP PG S RK PGEG 
ARAEALWPRDALLFAVDNLNRVEGLLPYNLSLEWMAIEAGLGDLPLLPFSSPSSPWS 
SDPFSFLQSVCHTVVVQGVSALLAFPQSQGEMMELDLVSLVLHIPVISIVRHEFPRES 
QNPLHLQLS LENS LSSDADVT VS ILTMNNWYNFS LLLCQEDWN I TDFLLLTQNNSKFH 
LG S 1 1 N I TANLP STQDLLS FLQ I QLE S I KNS T PTWMFGCDME S I RR I FE I TTQFG VM 
P PELRWVliGDSQNVEELRTEGLPLGL I AHGKTTQS VFEHYVQDAMELVARAVATATM I 
QPELALIPSTMNCMEVETTNLTSGQYLSRFLANTTFRGLSGSIRVKGSTIVSSENNFF 
IWNLQHDPMGKPMWTRLGSWQGGKIVMDYGIWPEQAQRHKTHFQHPSKIjHLRVVTLIE 
HPFVFTREVDDEGLCPAGQLCLDPMTNDSSTLDSLFSSLHSSNDTVPIKFKKCCYGYC 
IDLLEKIAEDMNFDFDLYI VGDGKYGAWKNGHWTGLVGDLLRGTAHMAVTS FS INTAR 
SQVIDFTSPFFSTSLGILVRTRDTAAPIGAFMWPLHWTMWLGIFVALHITAVFLTLYE 
WKSPFGLTSKGRNRSKVFSFSSALNICYALLFGRTVAIKPPKCWTGRFLMNLWAIFCM 
FCLST YT ANLAAVMVG E KI YE E LS G I HD P KLHH P S QG FR FGTVRE S S AED YVRQS F PE 
MHE YMRR YNV P AT PDG VE Y L KND P E KLD A F I MD KALLD YE VS I DAD C KL LTVG K P F A I 
EGYGIGLPPNSPLTANISELISQYKSHGFMDMLHDKWYRWPCGKRSFAVTETLQMGI 
KHFSGLFVLLCIGFGLSILTTIGEHIVYRLLLPRIKNKSKLQYWLHTSQRLHRAINTS 
F I EEKQQHFKTKRVEKRSNVG PRQLTVWNTSNLSHDNRRKYI FSDEEGQNQLGI RI HQ 
D I PLPPRRRELPALRTTNGKADSUJVSRNSVMQELS ELEKQI QVI RQELQLAVS RKTE 
LEEYQRTSRTCES 



Further analysis of the NOV33a protein yielded the following properties shown in 
Table 33B. 



Table 33B. Protein Sequence Properties NOV33a 


PSort 
analysis: 


0.6400 probability located in plasma membrane; 0.4600 probability located in 
Golgi body; 0.3700 probability located in endoplasmic reticulum (membrane); 
0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Likely cleavage site between residues 34 and 35 



A search of the NOV33a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 33C. 
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Table 33C. Geneseq Results for NOV33a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV33a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAU02199 


Human glutamate receptor-like 
protein, MEM4 - Homo sapiens, 1043 
aa. [WO200144473-A2, 21-JUN- 
2001] 


95..1103 
6..1007 


508/1047 (48%) 
680/1047 (64%) 


0.0 


AAB42494 


Human ORFX ORF2258 polypeptide 
sequence SEQ ID NO:4516 - Homo 
sapiens, 901 aa. [WO200058473-A2, 
05-OCT-2000] 


9S..985 
6..885 


484/912(53%) 
635/912 (69%) 


0.0 


AAU02198 


Human glutamate receptor-like 
protein, MEM3 - Homo sapiens, 971 
aa. [WO200144473-A2, 21-JUN- 
2001] 


532..1103 
362..935 \ 


361/579(62%) 
448/579(77%) \ 


0.0 


AAU02197 


Human glutamate receptor-like 
protein, MEM2 - Homo sapiens, 965 
aa. [WO200144473-A2.21-JUN- 
2001] 


532.. 1103 
362..929 


352/579(60%) j 
437/579 (74%) ; 


0.0 


AAR44192 


Rat NMDA receptor subunit, NR2A - 
Rattus rattus, 1464 aa. [DE4216321- 
A, 18-NOV-1993] 


175..1023 1 
77..911 ; 


245/873 (28%) i 
425/873(48%) ! 


2e-83 



In a BLAST search of public sequence databases, the NOV33a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 33D. 



Table 33D. Public BLASTP Results for NOV33a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV33a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


AAL40734 


N-METHYL-D-ASPARTATE 
RECEPTOR 3A - Homo sapiens 
(Human), 1115 aa. 


1..1115 
1..1115 


1110/1115(99%) 
1112/1115(99%) 


0.0 


Q62800 


IONOTROPIC GLUTAMATE 
RECEPTOR - Rattus norvegicus 
(Rat), 1115aa. 


1..1115 
1..1115 


1032/1115(92%) 
1083/1115(96%) 


0.0 


Q9R1M7 








0.0 
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RECEPTOR SPLICE VARIANT 
NR3A-2 - Rattus norvegicus (Rat), 
1135 aa. 


1..1135 


1083/1135 (94%) 




CAC69380 


SEQUENCE 7 FROM PATENT 
WOO 144473 - Homo sapiens 
(Human), 1043 aa. 


95..1103 
6.. 1007 


508/1047(48%) 
680/1047(64%) 


0.0 


Q91ZU9 


NMDA-TYPE GLUTAMATE 
RECEPTOR SUBUNIT NR3B 
PRECURSOR - Mus musculus 
(Mouse), 1003 aa. 


112.. 1103 
34..980 


510/1001 (50%) 
669/1001 (65%) 


0.0 



PFam analysis predicts that the NOV33a protein contains the domains shown in the 
Table 33E. 



Table 33E. Domain Analysis of NOV33a 


Pfam Domain 


NOV33a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


lig_chan: domain 1 of 1 


674..952 


81/323 (25%) 
232/323 (72%) 


4e-95 



Example 34. 



The NOV34 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 34A. 



Table 34A. NOV34 Sequence Analysis 




SEQIDNO: 111 


1253 bp 




NOV34a, 

CG59315-01 DNA Sequence 


CCCAGCGCCATGGGGGAGTGGGCGTTCCTGGGCTCGCTGCTGGACGCCGTGCAGCTGC 
AGTCGCCGCTCGTGGGCCGCCTCTGGCTGGTGGTCATGCTGATCTTCCX3CATCCTGGT 
GCTGGCCACGGTGGGCGGCGCCGTGTTCGAGGACGAGCAAGAGGAGTTCGTGTGCAAC 
ACGCTGCAGCCGGGCTGTCGCCAGACCTGCTACGACCGCGCCTTCCCGGTCTCCCACT 
ACCGC^TTCTGGCTCTTCCACATCCTGCTCCTCTC^GCGCCCCCGGTGCTGTTCGTCGT 
CTACTCCATGCACCGGGCAGGCAAGGAGGCGGGCGGCGCTGAGGCGGCGGCGCAGTGC 
GCCCCCGGACTGCCCGAGGCCCAGTGCGCGCCGTGCGCCCTGCGCGCCCGCCGCGCGC 
GCCGCTCCTACCTGCTCAGrcTGGCGCTCCGCCTCCTGGCCGAGCTCACCTTCCTGGG 
CGGCCAGGCGCTGCTCTACGGCTTXrCGCGTGGCCCCGCACTTCGCGTGCGCCGGTCCG 
CCCTGCCCGCACACGGTCGACTGCTTCGTGAGCCGGCCCACCGAGAAGACCGTCTTCG 
TGCTCTTCTATTTCGCGGTGGGGCTGCTGTCGGCGCTGCTCAGCGTAGCCGAGCTGGG 
CCACCTGCTCTGGAAGGGCCGCCCGCGCGCCGGGGAGCGTGACAACCGCTGCAACCGT 
GCACACGAAGAGGCGCAGAAGCTGCTCCCGCCGCCGCCGCCGCCACCTCCGCCACCGG 
CCCTGCCCTCCCGGCGCCCCGGCCCCGAGCCGTGCGCCCCGCCGGCCTATGCGCACCC 
GG CGCCGGCCAGC CTCCGCG AGTGCGG CAGCGG CCGCGG CAGGAATGCG CCAATGGCT 
CCCAGATGTGGACGCCACCGCTTAACCCCTTACCCCCCAGCCGCGCTCCCCCAAGGGC 
CTTCCAGCCTGAGCCCCGCCAACAGCAGGGAGCTCTGCCCAGGTGAGAACCAGCCCAG 
GACTGGAGTCAGCGCCAGCCCGCCCCTAGTGCCCACGGACACCTCCCAACCTAGATCC 
TACCTGT CTTCCTTCCTTCAGGCTGG AGGfGG AAGGCTCATGGACACAAGAATGCAAG C 
ATGCATCCACACAGCTACACTGCCTCCCATCCCCTC^ 

CCTCCCTCGCTCCCC^TCCTGGCAGGGCXSGGCGGCGCAGAGCGCTCCACTCCGGATTC 
CCCACGCCCCCGAGCCGTTCGCAGGCTCGCACAAG 




ORF Start: ATGat 10 


ORF Stop: AG at 1252 




SEQIDNO: 112 


414 aa 


MW at 44773.0kD 
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NOV34a, 

CG59315-01 Protein Sequence 



MGEWAFLGS LLDAVQLQS PLVG RLWLWML I FRI LVLATVGGAVFEDEQEE FVCNTLQ 
PGCRQTCYDRAFPVSHYRFWLFHILLLSAPPVLFWYSMHRAGKEAGGAEAAAQCAPG 
LPEAQCAPCALRARRARRCYLLSVALRLLAELTFLGGQALLYGFRVAPHFACAGPPCP 
HTVDCFVSRPTEKTVFVLFYFAVGLLSALLSVAELGHLLWKGRPRAGERDNRCNRAHE 
EAQKLLPPPPPPPPPPALPSRRPGPEPCAPPAYAHPAPASLRECGSGRGRNAPMAPRC 
GRHRLTPYPPAALPQGPSSLSPANSRELCPGENQPRTGVSASPPLVPTDTSQPRSYLS 
S F LEAGG EGS WTQECKHACTQLHCLP S P PAD AAR VPL PRS P S WQGG RRRALH SG F PT P 
PSRSQART 



Further analysis of the NOV34a protein yielded the following properties shown in 
Table 34B. 



Table 34B. Protein Sequence Properties NOV34a 


PSort 
analysis: 


0.6000 probability located in plasma membrane; 0.4000 probability located in 
Golgi body; 0.3000 probability located in endoplasmic reticulum (membrane); 
0.0300 probability located in mitochondrial inner membrane 


SignalP | 
analysis: 


Likely cleavage site between residues 39 and 40 



A search of the NOV34a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 34C. 



Table 34C. Geneseq Results for NOV34a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV34a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAW49009 


Mouse alpha 3 connexin protein - 
Mus sp, 417 aa. [WO9830677-A1, 
16-JUL-1998] 


1..296 
1..327 


121/334 (36%) 
169/334 (50%) 


5e-52 


AAW23968 


Connexin protein Cx40 - Homo 
sapiens, 358 aa. [WO9802150-A1, 
22-JAN-1998] 


1..215 
1..232 


93/233 (39%) 
133/233 (56%) 


9e-46 


AAW23970 


Connexin protein Cx45 - Homo 
sapiens, 396 aa. [WO9802150-A1, 
22-JAN-1998] 


4..212 
3..253 


93/252 (36%) 
137/252(53%) . 


3e-43 


AAW23969 


Connexin protein Cx43 - Homo 
sapiens, 382 aa. [WO9802150-A1, 
22-JAN-1998] 


1.216 
1..235 


86/235 (36%) 
130/235 (54%) 


le-42 


AAM93194 


Human polypeptide, SEQ ID NO: 
2573 - Homo sapiens, 370 aa. 
[EP1 130094-A2, 05-SEP-2001] 


7..384 . 
7..360 


129/409 (31%) 
169/409 (40%) 


8e-38 
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In a BLAST search of public sequence databases, the NOV34a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 34D. 



Table 34D. Public BLASTP Results for NOV34a 


Protein 
Accession 
Number 


Protein/Orpan ism/T .enoth 


NOV34a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q91YD1 1 


CONNEXIN30.2 - Mus musculus 
(Mouse), 278 aa. 


1..283 
1..265 


228/283 (80%) 
240/283 (84%) 


e-129 


146053 


connexin44 - bovine, 402 aa. 


1..397 
1..396 


151/418(36%) 
207/418(49%) | 


le-62 


P41987 


Gap junction alpha-3 protein 
(Connexin 44) (Cx44) - Bos taurus 
(Bovine), 401 aa. 


2..397 
1..395 


150/417(35%) 
206/417(48%) 


4e-62 


AAA50954 


CONNEXIN44 - Bos taurus 
(Bovine), 407 aa. 


1..398 
1..402 , 


154/429(35%) 
214/429 (48%) 


le-60 


Q9TU17 j 


GAP JUNCTION PROTEIN 
(CONNEXIN) - Ovis aries (Sheep), 
413 aa. 


1..398 
1..408 


147/415(35%) 
204/415 (48%) 


le-60 



PFam analysis predicts that the NOV34a protein contains the domains shown in the 
Table 34E. 



Table 34E. Domain Analysis of NOV34a 


Pfam Domain 


NOV34a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


DUF26: domain 1 of 1 


107..152 


12/56(21%) 
27/56 (48%) 


1.4 


connexin: domain 1 of 1 


1..212 


101/247(41%) 
150/247 (61%) 


6.5e-75 



Example 35. 

The NOV35 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 35A. 



Table 35A. NOV35 Sequence Analysis 




SEQIDNO: 113 


724 bp 
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NOV35a, 

CG59203-01 DNA Sequence 


TAAATTCGCGGCCGCGTCGACCTTCCGCAGACTCAACTGAGAAGTCAGCCTCTGCGGC 


AGGCACCAGGAATCTGCCTTTTCAGTTCTGTCTCCGGCAGGCTTTGAGGATOAAGGCT 


GCGGGCATTCTGACCCTCATTGGCTGCCTGGTCACAGGCGCCGAGTCCAAAATCTACA 
CTCGTTGCAAACTGGCAAAAATATTCTCGAGGGCTGGCCTGGACAATTACTGGGGCTT 
CAGCCTTGGAAACTGGATCTGCATGGCGTATTATGAGAGCGGCTACAACACCACAGCC 
CAGACGGTCCTGGATG ACGG CAGCATCGACTACGG CATCTTCCAGATCAACAGCTTCG 
CGTGGTGCAGACGCGGAAAGCTGAAGGAGAACAACCACTGCCACGTCGCCTGCTCAGC 
CTTGGTCACTGATGACCTCACAGATGCGATTATCTGTGCCAAGAAAATTGTTAAAGAG 
ACACAAGGAATGAACTATTGGCAAGGCTGGAAGAAACACTGTGAGGGGAGAGACCTGT 
CCGACTGGAAAAAAGACTGTGAGGTTTC CT AAACTGG AACTGG AC CCAGG ATG CTTTG 
CAGCAACGCCCTAGGGTTTGCAGTGAATGTCCAAATGCCTGTGTCATCTTGTCCCGTT 


TC CT CCCAATATTCCTTCTCAAACTTGGAGAGGG AAAATTAAGCT ATACTTTT AAGAA 


AATAAATATTTCCATTTAAATGTCAAAA 




ORF Start: ATG at 108 


ORF Stop: TAA at 552 




SEQIDNO: 114 


148 aa 


MWat 16655.9kD 


NOV35a, 

CG59203-01 Protein Sequence 


MKAAGILTLIGCLVTGAESKIYTRCKLAKI 

TTAQTVLDDGSIDYGIFQINSFAWCRRGKLKENNHCHVACSALVTDDLTDAIICAKKI 
VKETQGMNYWQGWKKHCEGRDLSDWKKDCEVS 




SEQIDNO: 115 


453 bp 


NOV35b, 

CUDyzUi-Uz DNA sequence 


CATTCTGACCCTCATTGGCrrGCCTGGTCACAGGCGCCGAGTCCAAAATCTACACTCGT 


TG C AAACTGG CAAAAAT ATT CT CG AGGG CTGG C CTGGACAATT A CTGGGG CTTCAGCC 


TTGG AAACTGGATCTGCATGGCGT ATTATG AGAGCGG CT ACAACACCACAG CCCAGAC 
GGTCCTGGATGACGGCAGCATCGACTACGGCATCTTCCAGATCAACAGCTTCGCGTGG 
TGCAGACGCGGAAAGCTGAAGGAGAACAACCACTGCCACGTCGCCTGCTCAGCCTTGG 
TCACTG ATGACCT CACAG ATG CAATT AT CTGTGC CAGG AAAATTGTT AAAG AGAC ACA 
AGGAATGAATTATTGGCAAGGCTGGAAGAAACATTGTCAGGGCAGAGACCTGTCCGAC 
TGGAAAAAAGGCTGTGAGGTTTCCTAAACTGGAACTGGACCCAGGAT 




ORF Start: ATG at 134 


ORF Stop: TAA at 431 




SEQIDNO: 116 


99 aa 


MWat 11288.6kD 


NOV35b, 

CG59203-02 Protein Sequence 


MAYYESGYNTTAQTVLDDGSIDYGIFQINSFAWCRRGKLKE^ 
DAI I CARKI VKETQGMMYWQGWKKHCEGRDLSDWKKGCEVS 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 35B. 



Table 35B. Comparison of NOV35a against NOV35b. 


Protein Sequence 


NOV35a Residues/ 
Match Residues j 


Identities/ 
Similarities for the Matched Region 


NOV35b 


50.. 148 
1..99 


97/99 (97%) 
98/99 (98%) 



Further analysis of the NOV35a protein yielded the following properties shown in 
Table 35C. 
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Table 35C. Protein Sequence Properties NOV35a 


Psort 
analysis: 


0.3700 probability located in outside; 0.1697 probability located in microbody 
(peroxisome); 0.1000 probability located in endoplasmic reticulum (membrane); 
0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Likely cleavage site between residues 20 and 2 1 



A search of the NOV35a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 35D. 



Table 35D. Geneseq Results for NOV35a 


Geneseq 
Identifier \ 


Protein/Organism/Length [Patent #, 
Date] 


NOV35a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAY57399 


Human lysoenzyme LYC2 
polypeptide - Homo sapiens, 148 aa. 
[WO200012722-A1, 09-MAR-2000] 


1..148 
1..148 


143/148 (96%) 
147/148 (98%) 


3e-86 


AAU29169 ; 


Human PRO polypeptide sequence 
#146 - Homo sapiens, 148 aa. 
[WO200168848-A2, 20-SEP-2001] 


1..148 
1..148 


143/148 (96%) 
146/148 (98%) 


6e-86 


AAB66145 


Protein of the invention #57 - 
Unidentified, 148 aa. [WO200078961- 
Al,28-DEC-2000] 


1..148 
1..148 


143/148 (96%) 
146/148 (98%) 


6e-86 


AAY99396 


Human PR01278 (UNQ648) amino 
acid sequence SEQ ID NO:203 - 
Homo sapiens, 148 aa. 
[WO200012708-A2, 09-MAR-2000] 


1..148 
1..148 


143/148 (96%) 
146/148 (98%) 


6e-86 


AAY71109 


Human Hydrolase protein-7 (HYDRL- 
7) - Homo sapiens, 1 94 aa. 
[WO200028045-A2, 18-MAY-2000] 


1..148 
47..194 


142/148 (95%) 
146/148 (97%) 


le-85 



In a BLAST search of public sequence databases, the NOV35a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 35E. 
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Table 3SE. Public BLASTP Results for NOV35a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV35a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96LF2 


BA14C22. 1 (NOVEL PROTEIN 
SIMILAR TO LYSOZYME) - Homo 
sapiens (Human), 148 aa. 


1..148 
1..148 


148/148 (100%) 
148/148 (100%) 


7e-88 


Q9H1R9 


BA534G20.i l (NOVEL PROTEIN 
SIMILAR TO LYSOZYME C-l (1,4- 
BETA-N-AGYLMURAMIDASE C, 
EC 3.2.1.17) (ISOFORM 1)) - Homo 
s aniens fl-Tuman^ 148 aa 


1.148 
1..148 


144/148 (97%) 
147/148 (99%) 


4e-86 


AAH21730 


HYPOTHETICAL 21.6 KDA 
PROTEIN - Homo sapiens (Human), 
194 aa. 


1..148 
47.. 194 


143/148 (96%) 
146/148 (98%) 


2e-85 


Q9CPX3 


1700038F02RIK PROTEIN - Mus 
musculus (Mouse), 148 aa. 


1..148 
1..148 


110/148 (74%) 
127/148 (85%) 


3e-66 


Q9H1R8 


BA534G20.1.2 (NOVEL PROTEIN 
SIMILAR TO LYSOZYME C-l (1,4- 
BETA-N-ACYLMURAMIDASE C, 
EC 3.2.1.17) (ISOFORM 2)) - Homo 
sapiens (Human), 106 aa (fragment). 


20..125 
1.106 


104/106 (98%) 
106/106 (99%) 


le-59 



PFam analysis predicts that the N0V35a protein contains the domains shown in the 
Table 35F. 



Table 35F. Domain Analysis of NOV35a 


Pfam Domain 


NOV35a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


lys: domain 1 of 1 


20..145 


68/129 (53%) 
107/129 (83%) 


8e-58 



Example 36. 



The NOV36 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 36A. 



Table 36A. NOV36 Sequence Analysis 
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SEQID NO: 117 


712 bp 


NOV36a, 

CG58662-01 DNA Sequence 


GCAGCTATTGCACTTAATCGCGGCTGCTAGCACCATOTCCCGCGTTTTGGTGCCTTGC 


CATGTGAAAGGCACCGTAGCCCTGCAGGTGGGGGACGTATGGACCTCCCAAGGCCGGC 
CTAGTGTGCTGGTCATTGATGTCACCTTCCCCTGTGTCACTCCGTTCGAGGGGATCAC 
ATTTAAGAATTATTACACAGCGTTTTTTGAGCATCCTGTCTGTCAGCACACCTCAGCA 
CACACACCGGCCAAGTGGGTGACCTGCCTGTGGGACTACTGTCTGATGCCCGACCCAC 
ACAGTG AGGAGGG AGCCCAGG AGT ATGTGTCG CTGTT C AAG CAACAGATACTGTGTGA 
CATGG CCAGAAT ATCGG AG CTACAC CTGATTCTG CAG CAGCCATCACCACTGTGGCTG 
T CTTTCACAGTGGAGGAGCTG CAGATCTATCAG CAGGGACCAAAG AG CC CCT CCATGA 
TCTTCCCCAAGTGGCTCTCCCACCCAGTGCCCTGTGAGCAACCTGCACTCCTCCATGA 
GGG TCTCCCAGACCCCAGCAGGGTATCCT CTGAGGTGCAG CAGATGTGGGCACTGACA 
GAGATGATCCGGGCCAGTCACACCTCCGCGAGGATAGGCCACTTTGATGTAGATGGCT 
GTTATGACCTGAACTTACTCTCCTACACTTGAGTGGTGGCTCCTAGCCAAGATGTTGG 
CCTTTCTGTGCCCACT 




ORF Start: ATG at 35 


ORF Stop: TGA at 668 




SEQID NO: 118 


211 aa |MWat23932.3kD 


NOV36a, 

CG58662-01 Protein Sequence 


MSRVLVPCHVKGTVALQVGD VWTSQGR PS VI>V I DVT F PCVT P F EG I T F KNYYT AF F EH 
P VCQHTS AHT PAKWVTC LWD YCLM PD PHS EEGAQE YVS L FKQQ I LCDMAR I S E LHL I L 
QQPSPLWLSFTVEELQIYQQGPKSPSMIFPKWLSHPVPCEQPALLHEGLPDPSRVSSE 
VQQMWALTEM I RASHTSARI GHFDVDGCYDLNLLS YT 




SEQID NO: 119 


843 bp 


NOV36b, 

CG58662-02 DNA Sequence 


CTGGCCTGAAGGCATGTCCCGCGTTCTAGCACCATGTCCCGCGTCTAGCACCATGTCC 


CGCGTCTAGCACCATGTCCCGCGTTCTAGCACCATGTCCCGCGTTCTAGCACCATGTC 


CCGCGTTCTAGCACCATGTCCCGCGTTTTGGTGCCTTGCCATGTGAAAGGCTCCGTAG 
CCCTCCAGGTGGGCGACGTGCGGACCTCCCAAGGCCGGCCTGGCGTGCTGGTCATCGA 
TGTCACCTTCCCCAGCGTCGCTCCCTTCGAGTTGCAGGAAATCACGTTTAAGAATTAC 
TAC ACAGCTTTTTTGAGCATCCGTGTC CGTCAGT ACACCTCAG CACACACACCTGCCA 
AGTGGGTG ACCTG CCTTCGGGACTACTGCCTGATG CCTGACCCACACAGTG AAGAGGG 
AGCCCAGGAGTATGTATCGCTGTTCAAGCATCAGATGCTATGTGACATGGCTAGAATA 
TCGGAGCTACGCCTGATTCTGCGGCAGCCATCACCACTGTGGCTGTCTTTCACAGTGG 
AGGAGCTGCAGATCTATCAGCAGGGACCAAAGAGCCCCTCCGTGACCTTTCCCAAGTG 
GCTCTCCCACC CAGTGCCCTGTGAGC AACCTGCACTC CTCCGTGAGGGTTT CCCAG AC 
CCCAGCAGGGTATCCTCCGAGGTGCAGCAGATGTGGGCACTGACAGAGATGATCCGGG 
CCAGTCACACCTCCGCAAGGATCGGCCGCTTTGATGTGGATGGCTGTTATGACCTGAC 
CTTGCTCTCCTACACTTGAATGGTTGCTCTTAGCCAAGATGTTGGCCTTTTTGTGGGC 


AC AGAAAGG CCAACGCGGG ACATGGTG CTAG 




ORF Start: ATG at 132 


ORF Stop: TGA at 771 




SEQID NO: 120 


213 aa MW at 24222.6kD 


NOV36b, 

CG58662-02 Protein Sequence 


MSRVLVPCHVKGSVALQVGDVRTSQGRPGVLVIDVTFPSVAPFELQEITFKNYYTAFL 
S I RVRQYTSAHTPAKWVTCLRDYCLMPD PHS EEGAQE YVS LFKHQMLCDMARISELRL 
I LRQ PS PLWLS FTVE E LQ I YQQG P KS PS VTF PKWLS H P VPC EQ P ALLREGF PD P S R VS 
SEVQQMWALTEMIRASHTSARIGRFDVDGCYDLTLLSYT 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 36B. 



Table 36B. Comparison of NOV36a against NOV36b. 


Protein Sequence 


NOV36a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV36b 


1..211 
1..213 


188/213 (88%) 
193/213 (90%) 



Further analysis of the NOV36a protein yielded the following properties shown in 
Table 36C. 
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Table 36C Protein Sequence Properties NOV36a 


PSort 
analysis: 


0.5666 probability located in microbody (peroxisome); 0.4500 probability 
located in cytoplasm; 0.1562 probability located in lysosome (lumen); 0.1000 
probability located in mitochondrial matrix space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 


A search of the NOV36a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 36D. 


Table 36D. Geneseq Results for NOV36a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV36a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG04038 


Human secreted protein, SEQ ID NO: 
81 19 - Homo sapiens, 1 15 aa. 
[EP1033401-A2, 06-SEP-2000] 


1..103 
1..105 


82/105 (78%) 
85/105 (80%) 


le-39 


In a BLAST search of public sequence databases, the NOV36a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 36E. 


Table 36E. Public BLASTP Results for NOV36a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV36a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9BSH3 


SIMILAR TO RIKEN CDNA 
1500032A17 GENE - Homo sapiens 
(Human), 213 aa. 


1..211 
1..213 


190/213 (89%) 
195/213 (91%) 


e-107 


Q9CQM0 


1500032A17RIK PROTEIN - Mus 
musculus (Mouse), 213 aa. 


1..211 
1..213 


174/213 (81%) 
183/213 (85%) 


4e-97 



PFam analysis predicts that the NOV36a protein contains the domains shown in the 
Table 36F. 
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Table 36F. Domain Analysis of NOV36a 




| Identities/ 




Pfam Domain 


NOV36a Match Region Similarities 


Expect Value 




1 for the Matched Region 




No Significant Matches Found 



Example 37. 

The NOV37 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 3 7 A. 



Table 37A. NOV37 Sequence Analysis 




SEQIDNO: 121 


520 bp 


NOV37a, 

CG58584-01 DNA Sequence 


CATTTGCTGTCTCCTCTGCTCACCAGCAGCTGTACTGGAGCCACCCGCGAAAATTCGG 
CCAGGGTTCTCGCTCTTGTCGTGTCTGTTCAAACCGGCACGGTCTGATCCGGAAATAT 
GGCCTCAATATGTGCCGCCAGTGTTTCCGTCAGTACGCGAAGGATATCGGTTTCATTA 
AGAAAGACCTGAGCTGTCTTCCTTGGCACTGCCTATGGAGGTGACACCCATCTCCTCC 
ATCATGGCCATCCTGAGACCGCTCGCGAAGCCCAAGATCATCAAAAAGAGCACCAAGT 


TCACTGGGAACCAGTCAGACTGATATGTCAAAATTAAGGGTAACTGGTGGAAACACAG 


AGGTATTGACAACAGGGTTCATAGAAGGTTTGAGGGCCAGATCTATGCCCAACATTGG 


TTATGGGAGAAACAAAAAGACAAAGCACATACTGCCCAGTGGCTTCTGGAAGTTCCTG 


GTCCACAACGTTAAGGAGCTGGAAGTACTGCTGGTGAGCAGAGGAGACAGCAAATG 




ORF Start: TTT at 3 


ORF Stop: TGAat216 




SEQ ID NO: 122 


71 aa MWat8461.8kD 


NOV37a, 

CG58584-01 Protein Sequence 


FAVSSAHQQLYWSHPRKFGQGSRSCRVCSNRHGLIRKYGLNMCRQGFRQVAKDIGFIK 
KDLSCLPWHCLWR 



Further analysis of the NOV37a protein yielded the following properties shown in 
Table 37B. 



Table 37B. Protein Sequence Properties NOV37a 


PSort 
analysis: 


0.6400 probability located in microbody (peroxisome); 0.4500 probability 
located in cytoplasm; 0.1000 probability located in mitochondrial matrix space; 
0.1000 probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV37a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 37C. 
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Table 37C. Geneseq Results for NOV37a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV37a 
Residues/ 
Match 

IVcMUUCd 


laeniities/ 
Similarities 
for the 
Matched 
Region 


Expect 
Value 


AAG76128 


Human colon cancer antigen protein 
SEQ ID NO:6892 - Homo sapiens, 80 
aa. [WO200122920-A2, 05-APR-2001] 


7..60 
2..55 


46/54 (85%) 
48/54 (88%) 


4e-24 


AAM79084 


Human protein SEQ ID NO 1746 - 
Homo sapiens, 56 aa. [WO2001 57190- 
A2, 09-AUG-2001] 


7..60 
3..56 j 


39/54 (72%) 
43/54(79%) ; 


2e-18 


AAG39921 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 49464 - Arabidopsis 
thaliana, 637 aa. [EP1033405-A2, 06- 
SEP-2000] 


7..63 
3..58 


40/57 (70%) 
45/57 (78%) 


2e-18 


AAM80068 


Human protein SEQ ID NO 37 1 4 - 
Homo sapiens, 74 aa. [WO200157190- 
A2, 09-AUG-2001] 


7..58 
22..73 


38/52(73%) 
42/52(80%) 


5e-18 


AAG34802 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 42406 - Arabidopsis 
thaliana, 56 aa. [EP1033405-A2, 06- 
SEP-2000] 


7..58 
3..54 


37/52(71%) 
42/52(80%) j 


le-17 



In a BLAST search of public sequence databases, the NOV37a protein was. found to 
have homology to the proteins shown in the BLASTP data in Table 37D. 
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Table 37D. Public BLASTP Results for NOV37a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV37a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


BAB79485 


RIBOSOMAL PROTEIN S29 - 
Homo saoiens (Hum an ^ 56 aa 


7..60 
3. .56 


53/54 (98%) 
53/54 ^98%^ 

~J *J 1 yxO /OF 


le-27 


P30054 


40S ribosomal protein S29 - Homo 
sapiens (Human),, 55 aa. 


7..60 
2..S5 


53/54 (98%) j 
53/54 (98%) 


le-27 


Q90YP2 


40S RIBOSOMAL PROTEIN S29 - 
Ictalurus punctatus (Channel 
catfish), 56 aa. 


7..60 
3..S6 


52/54 (96%) 
53/54 (97%) 


2e-27 


AAL62474 


RIBOSOMAL PROTEIN S29 - 
Spodoptera frugiperda (Fall 
armyworm), 56 aa. 


7..60 
3..S6 


41/54 (75%) 
48/54 (87%) 


6e-21 


Q9VH69 


CG8495 PROTEIN - Drosophila 
melanogaster (Fruit fly), 56 aa. 


10.. 60 
6..56 


41/51 (80%) 
46/51 (89%) 


3e-20 



PFam analysis predicts that the NOV37a protein contains the domains shown in the 
Table 37E. 



Table 37E. Domain Analysis of NOV37a 


Pfam Domain 


NOV37a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


Ribosomal SI 4: domain 1 of 
1 


7..61 


17/60(28%) 
51/60 (85%) 


7.5e-20 



Example 38. 

The NOV38 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 38A. 



Table 38A. NOV38 Sequence Analysis 




SEQIDNO: 123 


2039 bp 


NOV38a, 

CG58538-01 DNA Sequence 


GCAGCACACCTGCTCTGTGACTGACACTCTTGCAGAAGTGGGGCCACTTCAGGGACAT 


GGACAAGGTGTTGTACCTGCTGTCACAGAGCCTGTTATCTGTTCAGAATGACCGAAGA 


AGCATGCCG AACACGG AGTCAGAAACG AG CG CTTG AACGGG ACCCAACAG AGG ACGAT 
GTGGAGAGCAAGAAAATAAAAATGGAGAGAGGATTGTTGGCTTCAGATTTAAACACTG 
ACGGAGACATGAGGGTX3ACACCTGAGCCGGGAGCAGGTCCAACCCAAGGATTGCTGAG 
GGCAACAGAGGCCACGGCCATGGCCATGGGCAGAGGCGAAGGGCTGGTGGGCGATGGG 
CCCGTGGACATG CGCACCTCACACAGTG ACATGAAGTCCGAGAGGAG ACC CCCCTCAC 
CTGACGTGATTGTGCTCTCCGACAACGAGCAGCCCTCGAGCCCGAGAGTGAATGGGCT 
GACCACGGTGGCCTTGAAGGAGACTAGCACCGAGGCCCTCATGAAAAGCAGTCCTGAA 
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GAACGAGAAAGGATGATCAAGCAGCTGAAGGAAGAATTGAGGTTAGAAGAAGCAAAAC 
TCGTGTTGTTGAAAAAGTTGCGGCAGAGTCAAATACAAAAGGAAGCCACCGCCCAGAA 
GCCCACAGGTTCTGTTGGGAGCACCGTGACCACCCCTCCCCCGCTTGTTCGGGGCACT 
CAGAACATTCCTGCTGGCAAGCCATCACTCCAGACCTCTTCAG CTCGG ATGCCCGG CA 
GTGTCATACCCCCGCCCCTGGTCCGAGGTGGGCAGCAGGCGTCCTCGAAGCTGGGGCC 
ACAGGCGAGCTCACAGGTCGTCATGCCCCCACTCGTCAGGGGGGCTCAGCAAATCCAC 
AGCATTAGGCAACATTCCAGCACAGGGCCACCGCCCCTCCTCCTGGCCCCCCGGGCGT 
CGGTGCCCAGTGTG CAGATT C AGGG ACAG AGG ATC ATCCAG CAGGGCCTCATCCG CGT 
CGCCAATGTTCCCAACACCAGCCTGCTCGTCAACATCCCACAGCCCACCCCAGCATCA 
CTGAAGGGGACAACAGCCACCTCCGCTCAGGCCAACTCCACCCCCACTAGTGTGGCCT 
CTGTGGT CACCTCTG CCGAGTCTCCAGCAAGCCGACAGGCGGCCGCCAAGCTGG CGCT 
GCGCAAACAGCTGGAGAAGACGCTACTCGAGATCCCCCCACCCAAGCCCCCAGCCCCA 
GAGATGAACTTCCTGCCCAGCGCCGCCAACAACGAGTTCATCTACCTGGTCGGCCTGG 
AGGAGGTGGTGCAGAACCTACTGGAGACACAAGCAGGCAGGATGTCGGCCGCCACTGT 
GCTGTCC CGGGAGCC CT ACATGTG TGC ACAGTGCAAGACGGACTTCACGTGCCG CTGG 
CGGGAGGAGAAGAGCGGCGCCATCATGTGTGAGAACTGCATGACAACCAACCAGAAGA 
AGG CG CTCAAGGTGG AG CACACCAGCCGG CTG AAGGCCGCCTTTGTGAAGG CGCTGCA 
GCAGGAACAGGAGATTGAGCAGCGGCTCCTGCAGCAGGGCACGGCCCCTGCACAGGCC 
AAGGCCGAGCCCACCGCTGCCCCACACCCCGTGCTGAAGCAGGCCTCCAGCCAGCTGT 
CCCGGGGTTCGGCCACGACGCCCCGAGGTGTCCTGCACACGTTCAGTCCGTCACCCAA 
ACTGCAGAACTCAGCCTCGGCCACAGCCCTGGTCAGCAGGACCGGCAGACATTCTGAG 
AGAACCGTGAGCGCCGGCAAGGGCAGCGCCACCTCCAACTGGAAGAAGACGCCCCTCA 
GCACAGGCGGGACCCTTGCGTTTGTCAGCCCAAGCCTGGCGGTGCACAAGAGCTCCTC 
GG CCGTGG ACCG CCAGCG AGAGT ACCTCCTGGACATG ATC CCAC CCCGCTCCATCCC C 
CAGTCAGCCACGTGGAAATAGTGCGAGCCAGGCCCCGTGGAAGACGGGCTCCCTCCTC 
CCCCACCTGGCCCCTGGTCTAGAAGGACCCACTGCACCACCCTCCGCTGGCTCGGGAA 


GACACCGTG 




ORF Start: ATG at 106 


ORF Stop: TAG at 1933 




SEQ ID NO: 124 


609 aa MW at 65295. 8kD 


NOV38a, 

CG58538-01 Protein Sequence 


MTEEACRTRSQKRALERDPTEDDVESKKIKMERGLLASDLNTDGDMRVTPEPGAGPTQ 
GLLRATEATAMAMGRGEGLVGDGPVDMRTSHSDMKSERRPPSPDVIVLSDNEQPSSPR 
VNG LTTVALKE TS TEALMKS S P E ER ERMI KQLKE E LR L E E AKL VLLKKLRQS Q I Q KEA 
TAQKPTGSVGSTVTTPPPLVRGTQNIPAGKPSLQTSSARMPGSVIPPPLVRGGQQASS 
KLG PQASSQWMP PLVRGAQQIHS I RQHS STG P PPLLLAPRAS VPSVQ I QGQR I IQQG 
L I RVANVPNTS LLVNI PQ PTPAS LKGTTATS AQAN ST PTS VAS WTS AE S PAS RQAAA 
KLALRKQLEKTLLEIPPPKPPAPEMNFLPSAANNEFIYLVGLEEWQNLLETQAGRMS 
AATVLSREPYMCAQCKTDFTCRWREEKSGAIMCENCMTTNQKKALKVEHTSRLKAAFV 
KALQQEQEI EQRLLQQGTAPAQAKAEPTAAPH PVLKQASSQLSRGSATTPRGVLHTFS 
PSPKLQNSASATALVSRTGRHSERTVSAGKGSATSNWKKTPLSTGGTLAFVSPSLAVH 
KSSSAVDRQREYLLDMI PPRSIPQSATWK 



Further analysis of the NOV38a protein yielded the following properties shown in 
Table 38B. 



Table 38B. Protein Sequence Properties NOV38a 


PSort 
analysis: 


0.4404 probability located in mitochondrial matrix space; 0.3000 probability 
located in microbody (peroxisome); 0.1257 probability located in mitochondrial 
inner membrane; 0.1257 probability located in mitochondrial intermembrane 
space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV38a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 38C. 
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Table 38C. Geneseq Results for NOV38a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV38a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM00991 


Human bone marrow protein, SEQ ID 
NO: 492 - Homo sapiens, 502 aa. 
[WO200153453-A2, 26-JUL-2001] 


1..471 
4..473 


217/504(43%) 
290/504 (57%) 


2e-87 


AAM00944 


Human bone marrow protein, SEQ ID 
NO: 420 - Homo sapiens, 546 aa. 
[WO200153453-A2, 26-JUL-2001] 


1..471 
48..517 


217/504(43%) 
290/504(57%) ' 


2e-87 


AAM00831 


Human bone marrow protein, SEQ ID 
NO: 194 - Homo sapiens, 266 aa. 
[WO200153453-A2, 26-JUL-2001] 


1..197 
47..262 


84/217 (38%) 
110/217(49%) 


le-23 


AAM85818 j 


Human immune/haematopoietic 
antigen SEQ ID NO: 1341 1 - Homo 
sapiens, 84 aa. [WO200157182-A2, 
09-AUG-2001] 


417..471 
1..55 


41/55 (74%) 
49/55 (88%) 


7e-19 



In a BLAST search of public sequence databases, the NOV38a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 38D. 



Table 38D. Public BLASTP Results for NOV38a 



Protein 
Accession 
Number 



Protein/Organism/Length 



NOV38a 
Residues/ 

Match 
Residues 



Identities/ 
Similarities for the 
Matched Portion 



Expect 
Value 



No Significant Matches Found 



PFam analysis predicts that the NOV38a protein contains the domains shown in the 
Table 38E. 



Table 38E. Domain Analysis of NOV38a 


Pfam Domain 


NOV38a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


GATA: domain 1 of 1 


414..453 


12/43 (28%) 
17/43 (40%) 


1.1 
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Example 39. 

The NOV39 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 39A. 



Table 39A. NOV39 Sequence Analysis 




SEQIDNO: 125 


1421 bp 


NOV39a, 

CG59371-01 DNA Sequence 


ACCATTTCAGAGATGTCTTCCAGAAGTACCAAAGATTTAATTAAAAGTAAGTGGGGAT 
CGAAGCCTAGTAACTCCAAATCCGAAACTACATTAGAAAAATTAAAGGGAGAAATTGC 
ACACTTAAAGACATCAGTGGATGAAATCACAAGTGGGAAAGGAAAGCTGACTGATAAA 
GAGAGACACAGACTTTTGGAGAAAATTCGAGTCCTTGAGGCTGAGAAGGAGAAGAATG 
CTTATCAACTCACAGAGAAGGACAAAGAAAT ACAG CG ACTG AGAGAC CAACTGAAGGC 
CAGATATAGTACTACCACATTGCTTGAACAGCTGGAAGAGACAACGAGAGAAGGAGAA 
AGGAGGGAGCAGGTGTTGAAAGCCTTATCTGAAGAGAAAGACGTATTGAAACAACAGT 
TGTCTGCTGCAACCTCACGAATTGCTGAACTTGAAAGCAAAACCAATACACTCCGTTT 
ATCACAGACTGTGGCTCCAAACTGCTTCAACTCATCAATAAATAATATTCATGAAATG 
GAAATACAGCTG AAAGATGCTCTGG AGAAAAAT CAG CAGTGGCTCGTGTATG ATCAGC 
AGCGGGAAGTCTATGTAAAAGGACTTTTAGCAAAGATCTTTGAGTTGGAAAAGAAAAC 
GGAAACAGCTGCTCATTCACTCCCACAGCAGACAAAAAAGCCTGAATCAGAAGGTTAT 
CTTCAAGAAGAGAAGCAGAAATGTTACAACGATCTCTTGGCAAGTGCAAAAAAAGATC 
TTGAGGTTGAACGACAAACCATAACTCAGCTGAGTTTTGAACTGAGTGAATTTCGAAG 
AAAAT ATG AAG AAAC C CAAAAAGAAG TTCACAATTT AAAT CAG CTGTTG T ATT C ACAA 
AGAAGGGCAGATGTGCAACATCTGGAAGATGATAGGCATAAAACAGAGAAGATACAAA 
AACT CAGG GAAG AG AATG AT ATTG CT AGGGG AAAACTTG AAG AAG AG AAG AAG AG AT C 
CG AAGAGCTCTTAT CTCAGGTCCAGTCTCTTTACACATCTCTGCTAAAGCAG CAAGAA 
GAACAAACAAGGGTAGCTCTGTTGGAACAACAGATGCAGGCATGTACTTTAGACTTTG 
AAAATGAAAAACTC13ACCGTCAACATGTCCAGCATCAATTGCATGTAATTCTTAAGGA 
G CTC CG AAAAG CAAG AAAAAAT AT AACA CAGTTG G AAT C CTTG AAACAG CTTCATG AG 
TTTGCCATCACACtAGCCATTAGTCACTTTCCAAGGAGAGACTGAAAACAGAGAAAAAG 

ttgccgcctcaccaaaaagtcccactgctgcactcaatggaagcct^tggaatgtcc 
caagtgc^tatacagtatcc^gccactgagcatcgcgatctgcttgtccatgtggaa 
tactgttcaaagtagcaaaataagtattt 




ORF Start: ATG at 13 


ORF Stop: TAG at 1405 




SEQIDNO: 126 


464 aa 


MW at 54045.6kD 


NOV39a, 

CG59371-01 Protein Sequence 


MSSRSTKDLI KSKWGSKPSNSKSETTLEKLKGEI AHLKTSVDE ITSGKGKLTDKERHR 
LLEKIRVLEAEKEKNAYQLTEKDKEIQRLRDQLKARYSTTTLLEQLEETTREGERREQ 
VLKALS EE KDVLKQQLSAATSRI AELES KTNTLRLSQTVAPNC FNS SINNIHEMEI QL 
KDALEKNQQWLVYDQQREVYVKGLLAKIFELEKKTETAAHSLPQQTKKPESEGYLQEE 
KQKC YNDL LAS AKKD LEVERQT ITQLS F E LSE FRRK YE ETQKE VHNLNQL L YSQRRAD 
VQHLEDDRHKTEKIQKLREENDIARGKLEEEKKRSEELLSQVQSLYTSLLKQQEEQTR 
VALLEQQMQACTLDFENEKI^RQHVQHQLHVILKELRKARKNITQLESIiKQLHEFAIT 
E PLVTFQG ETENRE KVAAS PKS PTAALNGSLVECPKCNI Q YPATEHRDLLVHVE YCS K 



Further analysis of the NOV39a protein yielded the following properties shown in 
Table 39B. 



Table 39B. Protein Sequence Properties NOV39a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



212 



WO 02/072757 



PCT/US02/06908 



A search of the NOV39a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 39C. 



Table 39C. Geneseq Results for NOV39a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV39a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB92925 


Human protein sequence SEQ ID 
NO: 1 1 576 - Homo sapiens, 23 1 aa. 
[EP1074617-A2, 07-FEB-2001] 


170..392 
1..223 


222/223 (99%) 
222/223 (99%) 


e-122 


AAG75490 


Human colon cancer antigen protein 
SEQ ID NO:6254 - Homo sapiens, 
165 aa. [WO200122920-A2, 05-APR- 
2001] 


1..67 
99..165 


64/67 (95%) 
64/67 (95%) 


le-28 


AAM78520 


Human protein SEQ ID NO 1 1 82 - 
Homo sapiens, 990 aa. 
[WO200157190-A2.09-AUG-2001] j 


6..394 
515..929 


96/421 (22%) ! 
182/421 (42%) j 


3e-12 


AAM41000 


Human polypeptide SEQ ID NO 593 1 
- Homo sapiens, 1988 aa. 
[WO200 15331 2-A1 , 26-JUL-2001 ] 


70..420 
852..1203 


90/384 (23%) 
161/384(41%) 


3e-12 


AAM40999 


Human polypeptide SEQ ID NO 5930 1 
- Homo sapiens, 1988 aa. ; 
[WO200153312-A1, 26-JUL-2001] 


70..420 
852..1203 


90/384 (23%) 
161/384(41%) 


3e-12 



In a BLAST search of public sequence databases, the NOV39a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 39D. 



Table 39D. Public BLASTP Results for NOV39a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV39a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96H32 


SIMILAR TO RIKEN CDNA 
1200008012 GENE - Homo sapiens 
(Human), 464 aa. 


1..464 
1..464 


458/464 (98%) 
458/464 (98%) 


0.0 


Q9DBZ8 


1200008O12RIK PROTEIN - Mus 
musculus (Mouse), 462 aa. 


1..464 
1..462 


348/464 (75%) 
401/464 (86%) 


0.0 


Q9NVS7 








e-122 
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NT2RP2001245 - Homo sapiens 
(Human), 231 aa. 


1..223 


222/223 (99%) 




Q9CZP8 


2700032M20RIK PROTEIN - Mus 
musculus (Mouse), 189 aa. 


1..176 
1..176 


121/176 (68%) 
150/176 (84%) 


3e-63 


Q9VJE5 


CLEP-190 PROTEIN - Drosophila 
melanogaster (Fruit fly), 1690 aa. 


4..439 
675..1118 j 


108/461 (23%) 
203/461 (43%) 


2e-16 



PFam analysis predicts that the NOV39a protein contains the domains shown in the 
Table 39E. 



Table 39E. Domain Analysis of NOV39a 



Pfam Domain 



NOV39a Match Region 



Identities/ 
Similarities 
for the Matched Region 



Expect Value 



No Significant Matches Found 



Example 40. 

The NOV40 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 40A. 



Table 40A. NOV40 Sequence Analysis 



SEQIDNO: 127 |3955 bp 



NOV40a> 

CG59346-01 DNA Sequence 



TGCACCCTCGCTGCCT C CTTTCCTCCATGCTGCCTGGATCTGG CG AGCTGGGGTGATT 



AATTGGCTATGATGATGAACGTCCCCGGCGGAGGAGCGGCCGCGGTGATGATGACGGG 



CTACAATAATGGTCGCTGTCCCCGGAATTCTCTCTACAGTGACTGCATTATTGAGGAG 
AAGACGGTGGTCCTGCAGAAAAAAGACAATGAGGGCTTTGGATTCGTGCTTCGAGGGG 
CCAAAGCTGACACACCCATTGAAGAATTCACACCAACACCGGCTTTCCCAGCCCTACA 
GTACCTGG AGTCCGTGGATGAAGGTGGGGTGG CGTGG CAAGCCGGACT AAGG ACCGGG 
GACTTCTTGATTGAGGTTAACAATGAGAATGTTGTCAAAGTCGGCCACAGGCAGGTGG 
TGAACATGATCCGGCAGGGAGGGAATCACCTGGTCCTTAAGGTGGTCACGGTGACCAG 
GAATCTGGACCCCGACGACACCGCCAGGAAGAAAG CTCCCCCG CCTCCAAAG CGGGCA 
CCGACCACAGCCCTCACCCTGCGCTCCAAGTCCATGACCTCGGAGCTGGAGGAGCTCG 
ATAAACCCG AGGAG ATAGTCCCGG C CTCCAAGCCCTCCCG CGCTG CTGAGAACATGGC 
TGTGG AACCG AGGGTGGCGACCATCAAG CAGCGG CCCAGCAGCCGGTGCTTCCCGGCG 
GGCTCAG ACATGAACGTG AGTGG CCGTAC CTTGGG ACCACGAGGGCGGGGGCCGACGG 
TGCCCCCTAGGCTCTCTGGTTTGCAGTCTGTGTACGAACGCCAAGGAATCGCCGTGAT 
GACGCCCACTGTTCCTGGGAGCCCAAAAGCCCCGTTTCTGGGCATCCCTCGAGGTACG 
ATGCG AAGG CAGAAATCAATAGG AATAACAG AGGAAG AGCGGCAGTTTCTGGCTCCT C 
CAATGCTGAAGTTCACCAGAAGCCTGTCCATGCCGGACACCTCTGAGGACATCCCCCC 
TCCACCGCAGTCTGTGCCCCCGTCCCCACCACCACCTTCCCCAACCACTTACAACTGC 
CCCAAGTCCCCAACTCCAAGAGTCTACGGGACGATTAAGCCTGCGTTCAATCAGAATT 
CTG CCGCCAAGGTGTCCC C CGC CAC CAGGTCCGACACCGTGGC CAC CATG ATG AGGGA 
GAAGGGGATGTACTTCAGGAGAGAGCTGGACCGCTACTCCTTGGACTCTGAAGACCTC 
TACAGTCGGAATGCCGGCCCGCAAGCCAACTTCCGCAACAAGAGAGGCCAGATGCCAG 
AAAACCCATACTCAGAGGTGGGGAAGATCGCCAGCAAAGCCGTCTACGTCCCCGCCAA 
GCCCGCCAGGCGGAAGGGGATGCTGGTGAAGCAGTCCAACGTGGAGGACAGCCCCGAG 
AAGACGTGCTCCATCCCTATCCCGACCATCATCGTGAAGGAGCCGTCCACCAGCAGCA 
G CGGCAAG AGCAGC CAGGG CAGCAGCATGGAGATCG ACCCCCAGGCCCCGGAGCCACC 
GAG CCAGCTGCGGC CTGACG AAAGCCTG ACCGTCAGCAG CCC CTTTGCCG CCGCCATC 
G CCGG AG CCGTCCG CGACCGTG AGAAGCGG CTGG AAGCCAGG AGGAACTC CCCGGCCT 
TCCTCTCCACAGACCTGGGGGATGAGGATGTGGGCCTGGGGCCACCCGCCCCCAGGAC 
GCGGCCCT CCATCTTCCCCGAGGAGGGGGATTTTGCTG ACGAGGACAG CG CTGAGCAG 
CTGTCATCCCCCATGCCGAGTGCCACGCCCAGGGAGCCCGAAAACCATTTCGTGGGTG 
GCG CCGAGG C CAGTGCTCCGGGTG AGGCTGGGAGG CCG CTGAATTCCACGTCCAAAGC 
CCAGGGGCCCGAGAGCAGCCCAGCAGTGCCCTCCGCGAGCAGCGGCACAGCCGGCCCC 
GGGAATTATGTCCACCCACTCACAGGGCGGCTGCTTGATCCCAGCTCCCCGCTGGCCC 
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TGGCACTCTCCGCAAGGGACCGAGC CATGAAGG AGTCTCAACAGGGACC CAAAGGGGA 
GGCCCCCAAGGCCGACCTCAACAAACCTCTTTACATTGATACCAAAATGCGGCCCAGC 
CTGGATGCCGGCTTCCCTACGGTCACCAGGCAGAACACCCGGGGACCCCTGAGGCGGC 
AGGAGACGGAGAACAAGTACGAGACCGACCTGGGCCGAGACCGGAAAGGCGATGACAA 
GAAGAACATGCTGATCGACATCATGGACACGTCCCAGCAGAAGTCGGCTGGCCTGCTG 
ATGGTGCACACCGTGGACGC C ACT AAG CTGGACAACGCCCTGCAGGAAG AGG ACGAGA 
AGGCAGAGGTGGAGATGAAGCCAGACAGCTCGCCGTCCGAGGTGCCAGAAGGTGTTTC 
CGAAACCGAAGGTGCTTTACAGATCTCCGCTGCCCCCGAGCCCACCACCGTGCCCGGC 
AGAACCATCGTCGCGGTGGGCTCCATGGAAGAGGCGGTGATTTTGCCATTCCGCATCC 
CTCCTCCCCCTCTGGCATCCGTGGACTTGGATGAGGATTTTATTTTTACAGAGCCATT 
GCCTCCTCCCCTGGAATTTGCAAATAGTTTTGATATCCCCGATGACCGGGCAGCTTCT 
GTCCCGGCTCTCTCAGACTTAGTGAAG CAGAAGAAAAG CG ACACCCCTCAGTCCCCTT- 
CGTTGAACTCCAGCCAACCAACCAACTCTGCAGACAGCAAGAAGCCAGCCAGTCTTTC 
AAACTGTCTGCCTGCCTCATTCCTGCCACCCCCTGAAAGCTTTGACGCCGTCGCCGAC 
TCTGGGATCG AGGAGGTGGACAG CCGGAGTAGCAG CG ACCACCAC CTCG AGACGACCA 
GCACTAT CT CCACCGTGTCTAGCATCTCCACCCTGT CTTCCGAAGGTGG AG AGAATGT 
GG ACACCTG CACAGTCTATGCAGATGGG CAAGCATTT ATGGTTGACAAACCCCCAGTA 
CCTCCTAAGCCAAAAATGAAGCCCATCATTCACAAAAGCAATGCACTTTATCAAGACG 
CGCTCGTGGAAGAAGATGTAGATAG CTTTGTTATC CCCCCG CCCGCTCCCCCGC CCCC 
GCCGGGCAGTGCCCAGCCTGGGATGGCCAAGGTTCTCCAGCCAAGGACCTCCAAGTTG 
TGGGGCG ACGTCACAG AG ATCAAAAGCCCGATTCTCTCAGG CC C AAAGGCAAACGTTA 
TTAGTGAATTGAACTCTATCCTACAGCAAATGAACCGAGAGAAATTGGCAAAGCCGGG 
GG AAGGACTGGATTCACC AATGGGAGC CAAGTCCGCCAGCCTCGCTCCAAGAAGCC CG 
GAGATCATGAG C ACCATCT CAGGTACACGGAGCACGACGGTCACCTTCACTGTTCG C C 
CCGGCACCTCCCAGCCCATCACCCTGCAGAGCCGGCCCCCCGACTATGAAAGCAGGAC 
CTCAGGAACAAGACGTG CCCCAAGCCCTGTGGTCT CGCCAACAGAGATG AACAAAGAG 
ACCCrGCCCGCCCCCCTGTCTGCTGCCACCGCCTCTCCTTCTCCCGCTCTCTCAGATG 
TCTTTAGCCTTCCAAGCCAGCCCCCTTCTGGGGATCTATTTGGCTTGAACCCAGCGGG 
ACGCAGTAGGTCGCCATCCCCCTCGATACTGCAACAGCCAATCTCAAATAAGCCTTTT 
ACAACTAAACCTGTCCACCTGTGGACTAAACCAGATGTGGCCGATTGGCTGGAAAGTC 
TAAACTTGGGTGAACATAAAGAGGCCTTCATGGACAATGAGATCGATGGCAGTCACTT 
ACCAAACCTGCAGAAGGAGGACCTCATCGATCTTGGGGTAACTCGAGTCGGGCACAGA 
ATGAACATAG AAAGGGCTTTG AAACAGCTGCTGG ACAGAT AAGGACGGCTG CTCT CCA 
CCTCG CAGACTGCTCTTGTT ATAAGTAGAGATGGGCTCGTG CTGAAACATCTG AATGC 


CAAGCGAAGTC 




ORF Start: ATG at 67 


ORF Stop: TAA at 3868 




SEQ ID NO: 128 


1267 aa 


MW at 136108.7kD 


NOV40a, 

CG59346-01 Protein Sequence 


MMMNVPGGGAAAVMMTGYNNGRCPRNSLYSDCI I E E KTWLQKKDNEG FGFVLRG AKA 
DTPIEEFTPTPAFPALQYLESVDEGGVAWQAGLRTGDFLIEVNNENWKVGHRQWNM 
IRQGGNHLVLKVVTVTRNLDPDDTARKKAPPPPKRAPTTALTLRSKSMTSELEELDKP 
EEIVPASKPSRAAENMAVEPRVATIKQRPSSRCFPAGSDMNVSGRTLGPRGRGPTVPP 
RLSGLQSVYERQGIAVMTPTVPGSPKAPFLGIPRGTMRRQKSIGITEEERQFLAPPML 
KFTRSLSMPDTSEDIPPPPQSVPPSPPPPSPTTYNCPKSPTPRVYGTIKPAFNQNSAA 
KVSPATRSDTVATMMREKGMYFRRELDRYSLDSEDLYSRNAGPQANFRNKRGQMPENP 
YSEVGKI AS KAVYVPAKPARRKGMLVKQSNVEDS PEKTCS I PI PT 1 1 VKEPSTS S SGK 
SSQGSSMEIDPQAPEPPSQLRPDESLTVSSPFAAAIAGAVRDREKRLEARRNSPAFLS 
TDLGD E D VG LG P PAPRTR P S M F P E EGD FADED SAEQLS S PM PS AT PRE P ENH FVGGAE 
ASAPGEAGRPLNSTSKAQGPESS PAVPS AS SGTAGPGNYVHPLTGRLLD PSS PLALAL 
SARDRAMKESQQG PKGEAPKADLNKPLYIDTKMRPS LDAGFPTVTRQNTRG PLRRQET 
ENKYETDLGRD RKGDD KKNML I D I MDTSQQKS AGLLMVHTVD ATKLDNALQE EDEKAE 
VEMKPDSSPSEVPEGVSETEGALQISAAPEPTTVPGRTIVAVGSMEEAVILPFRIPPP 
PLASVDLDEDFIFTEPLPPPLEFANSFDIPDDRAASVPALSDLVKQKKSDTPQSPSLN 
SSQPTNSADSKKPASLSNCLPASFLPPPESFDAVADSGIEEVDSRSSSDHHLETTSTI 
STVSS ISTLSSEGGENVDTCTVYADGQAFMVDKPPVPPKPKMKPI IHKSNALYQDALV 
E ED VDS FVI PP P A P P P P PG S AQ PGMAKVLQ PRTS KLWGDVTE I KS P I LS G P KANVI S E 
LNSILQQMNREKIAKPGEGLDSPMGAKSASLAPRSPEIMSTISGTRSTTVTFTVRPGT 
SQPITLQSRPPDYESRTSGTRRAPSPWSPTEMNKETLPAPLSAATASPSPALSDVFS 
LPSQPPSGDLFGLNPAGRSRSPSPSILQQPISNKPFTTKPVHLWTKPDVADWLESLNL 
GEHKEAFMDNEIDGSHLPNLQKEDLIDIiGVTRVGHRMNIERALKQLLDR 



Further analysis of the NOV40a protein yielded the following properties shown in 
Table 40B. 



Table 40B. Protein Sequence Properties NOV40a 



215 



WO 02/072757 



PCT/US02/06908 



PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV40a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 40C. 



Table 40C. Geneseq Results for NO\ 40a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV40a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM79240 


Human protein SEQ ID NO 1902 - 
Homo sapiens, 1248 aa. 
[WO200157190-A2, 09-AUG- 
2001] 


14.. 1267 
1..1248 


1231/1271 (96%) 
1231/1271 (96%) 


0.0 


AAB31518 


Amino acid sequence of the rat 
Shank2 polypeptide - Rattus sp, 
1470 aa. [WO200078921-A2, 28- 
DEC-2000] 


30..1267 
240.. 1470 


1078/1255 (85%) 
1132/1255 (89%) 


0.0 


AAM80224 


Human protein SEQ ED NO 3870 - 
Homo sapiens, 1161 aa. 
[WO200157190-A2, 09-AUG- 
2001] 


172..1267 
82.1161 


1071/1103(97%) 
1071/1103(97%) 


0.0 


AAB31517 


Amino acid sequence of the rat 
Shank3a polypeptide - Rattus sp, 
1740 aa. [WO200078921-A2, 28- 
DEC-2000] 


18..1264 
550..1737 


496/1349 (36%) 
673/1349 (49%) 


0.0 


AAY83017 


Rat shank 3a - Rattus rattus, 1740 
aa. [WO200011204-A2, 02-MAR- 
2000] 


18..1264 
550..1737 


496/1349 (36%) 
673/1349(49%) 


0.0 



In a BLAST search of public sequence databases, the NOV40a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 40D. 
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Table 40D. Public BLASTP Results for NOV40a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV40a 
Residues/ 
Match 
Residues 


lUcu ll Ilea/ 

Similarities for the 
Matched Portion 


Expect 
Value 


Q9UPX8 


KIAA1022 PROTEIN - Homo 
sapiens (Human), 1131 aa 
(fragment). 


124.. 1267 
131 


1121/1154 (97%) 
1121/1154 (97%) 


0.0 


Q9QX93 


PROLINE RICH SYNAPSE 
ASSOCIATED PROTEIN 1 - 
Rattus norvegicus (Rat), 1252 aa. 


2.. 1267 
1..1252 


1103/1276 (86%) 
1158/1276 (90%) 


0.0 


070470 


CORTACTIN-BINDING 
PROTEIN 1 - Rattus norvegicus 
(Rat), 1252 aa. 


2..1267 
1..1252 


1102/1276 (86%) 
1158/1276 (90%) 


0.0 


Q9WUV9 


PROLINE RICH SYNAPSE 
ASSOCIATED PROTEIN 1 - 
Rattus norvegicus (Rat), 1259 aa. 


2..1267 
1..1259 


1103/1283(85%) 
1158/1283 (89%) 


0.0 


Q9WUW0 


PROLINE RICH SYNAPSE 
ASSOCIATED PROTEIN 1 - 
Rattus norvegicus (Rat), 1250 aa. 


2..1267 
1..1250 


1095/1276 (85%) 
1151/1276 (89%) 


0.0 



PFam analysis predicts that the NOV40a protein contains the domains shown in the 
Table 40E. 



Table 40E. Domain Analysis of NOV 40a 


Pfam Domain 


NOV40a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


PDZ: domain 1 of 1 


38..131 


23/97 (24%) 
70/97 (72%) 


le-07 


SAM: domain 1 of 1 


1202.. 1265 


27/68 (40%) 
53/68 (78%) 


9.8e-22 



Example 41. 



The NOV41 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 41 A. 
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Table 41 A. NOV41 Sequence Analysis 




SEQIDNO: 129 


2069 bp 


NOV41a, 

CG57814-01 DNA Sequence 


GGACACTGACATGGACTGAAGGAGTAGAAAGCACTATAAATGTCTTTCCTTATCTGTG 


TGTACTCTTATCTCACTGTTCTATTTTTTCTCCTCATTTATATTAACTCTTTCTTACC 


TTTTTTTCTGAACTTCTAGGCCTTCTCTTTCCAGAACTGGTGGAAGACAAATGAAACG 


GCCAAGATGGT AAGAAACAAGC CG CATTTCTCCTTGGGG AGACTGAT AATTTAAAAGG 


TTTGTTGTGTCAGAAACATTCCCAGCTTCATCACCAACCCTTTCCTTCCACCTCTGCC 


CACTGGAGACCACTTACATCCCGAAGCGGACGCGGCAGCTGAAGTCAGGAAACCATGC 


ATC ACATTAG C AGGAG CCAACTGCAGACTTT AAACTCCGTTC AAC ATGTGG ATG CGGC 


AGAG AAATGACC TGTCC AG ACAAGCCGGGGCAG CTCATAAACTGGTTCATCTG CTC CC 
TGTGCGTCCCGCGGGTGCGTAAGCTCTGGAGCAGCCGGCGTCCAAGGACCCGGAGAAA 
CCTTCTGCTGGGCACTGCGTGTGCCATCTACTTGGGCTTCCTGGTGAGCCAGGTGGGG 
AGGGCCTCTCTCCAGCATGGACAGGCGGCTGAGAAGGGGCCACATCGCAGCCGCGACA 
CCGCCGAGCCATCCTTCCCTGAGATACCCCTGGATGGTACCCTGGCCCCTCCAGAGTC 
CCAGGGCAATGGGTCCACTCTGCAGCCCAATGTGGTGTACATTACCCTACGCTCCGAG 
CGCAGCAAGCCGGCCAATATCCGTGGCACCGTGAAGCCCAAGCGCAGGAAAAAGCATG 
CAGTGGCATCGGCTGCCCCAGGGCAGGAGGCTTTGGTCGGACCATCCCTTCAGCCGCA 
GGAAGCGGCAAGGGAAGCTGATGCTGTAGCACCTGGGTACGCTCAGGGAGCAAACCTG 
GTTAAGATTGGAG AGCG AC CCTGGAGG TTGGTG CGGGG TCCGGGAGTGCG AGCCGGGG 
GCC CAGACTTCCTG CAG CCCAGCTCCAGGGAGAGCAACATTAGGAT CT ACAGCGAGAG 
CG CCC CCTCCTGGCTGAGC AAAGATG ACATCCGAAGAATG CGACTCTTGG CGGACAGC 
GCAGTGGCAGGGCTCCGGCCTGTGTCCTCTAGGAGCGGAGCCCGTTTGCTGGTGCTGG 
AGGGGGGCGCACCTGGCGCTGTGCTCCGCTGTGGCCCTAGCCCCTGTGGGCTTCTCAA 
GCAGCCCTTGGACATGAGTGAGGTGTTTGCCTTCCACCTAGACAGGATCCTGGGGCTC 
AAC AGG ACCCTG CCGTCTGTGAG CAGG AAAGCAG AGTTCATCC AAGATGGC CGCCCAT 
GCCCCATCATTCTTTGGGATGCATCTTTATCTTCAGCAAGTAATGACACCCATTCTTC 
TGTTAAGCTCACCTGGGGAACTTATCAGCAGTTGCTGAAACAGAAATGCTGGCAGAAT 
GGCCGAGTACCCAAG CCTG AAT C AGGTTGTACTG AAAT AC ATCATC ATG AGTGGTCCA 
AGATGGCACTCTTTGATTTTTTGTTACAGATTTATAATCGCTTAGATAC7VAATTGCTG 
TGGATTCAGACCTCGCAAGGAAGATGCCTGTGTACAGAATGGATTGAGGCCAAAATGT 
G ATGACCAAGGTTCTGCGG CTCTAGCACACATTATCCAG CG AAAG CATGACC CAAGG C 
ATTTGGTTTTTATAGACAACAAGGGTTTCTTTGACAGGAGTGAAGATAACTTAAACTT 
CAAATTGTTAGAAGGCATCAAAGAGTTTCCAGCTTCTGCAGTTTCTGTTTTGAAGAGC 
CAGCACTTACGGCAGAAACTTCTTCAGTCTCTGTTTCTTGATAAAGTGTATTGGGAAA 
GTCAAGGAGGTAGACAAGGAATTGAAAAGCTTATCGATGTAATAGAACACAGAGCCAA 
AATTCTTATCACCTATATCAATGCACACGGGGTCAAAGTATTACCTATGAATGAATGA 
CAAAAGAATCTTCTGGCTAGGGTGTTAGATATATTTATGCATTTTTGGTTTTGTTTTT 


AAATCAAGCAC AT CAAC CTCAAGCCCGTTTAGCAATG AG 




ORF Start: ATG at 413 


ORF Stop: TGAat 1970 




SEQIDNO: 130 


519 aa |MW at 57552.4kD 


NOV41a, 

CG57814-01 Protein Sequence 


MTCPDKPGQLINWFICSLCVPRVRKLWSSRRPRTRRNLLLGTACAIYU3FLVSQVGRA 
SLQHGQAAEKGPHRSRDTAEPSFPEIPLDGTLAPPESQGNGSTLQPNWYITLRSERS 
KPANIRGTVKPKRRKKHAVASAAPGQEALVGPSLQPQEAAREADAVAPGYAQGANLVK 
IGERPWRLVRGPGVRAGGPDFLQPSSRESNIRIYSESAPSWLSKDDIRRMRLLADSAV 
AGLRPVSSRSGARLLVLEGGAPGAVLRCGPSPCGLLKQPLDMSEVFAFHLDRILGLNR 
TLPSVSRKAEFIQDGRPCPIILWDASLSSASNDTHSSVKLTWGTYQQLLKQKCWQNGR 
VPKPESGCTEIHHHEWSKMALFDFLLQIYNRLDTNCCGFRPRKEDACVQNGLRPKCDD 
QGS AALAH I IQRKHD PRHL VF I DN KG F FDRS EDNLNF KLLEG I KE F PAS AVS VLKS QH 
LRQKLI^SLFLDKVYWESQGGRQGIEKLIDVIEHRAKILITYINAHGVKVLPMNE 




SEQIDNO: 131 


1740 bp 


NOV41b, 

CG57814-02 DNA Sequence 


GG CAG CTG AAGTC AGG AAACCATGCATC AC ATT AG CAGG AGCCAACTGCAG ACTTT AA 


ACTCCGTTCAACATGTGGATGCGGCAGAGAAATOACCTGTCCAGACAAGCCGGGGCAG 
CTCATAAACTGGTTCATCTGCTCCCTGTGCGTCCCGCGGGTGCGTAAGCTCTGGAGCA 
GCCGGCGTCCAAGGACCCGGAGAAACCTTCTGCTGGGCACTGCGTGTGCCATCTACTT 
GGG CTTCCTGGTGAGCCAGGTGGGGAGGG CCTCTCTCCAGCATGGACAGGCGGCTGAG 
AAGGGG CCACATCGCAGCCGCGACACCGCCGAGCCATCCTTCC CTGAGATACCCCTGG 
ATGGTACCCTGGCCCCTCCAGAGTCCCAGGGCAATGGGTCCACTCTGCAGCCCAATGT 
GGTGTACATTACCCTACGCTCCAAGCGCAGCAAGCCGGCCAATATCCGTGGCACCGTG 
AAGCCCAAGCGCAGGAAAAAGCATGCAGTGGCATCGGCTGCCCAAGGGCAGGAGGCTT 
TGGTCGGACCATCCCTTCAGCCGCAAGAAGCGGCAAGGGAAGCTGATGCTGTAGCACT 
GGGTACGCTCAGGAGCAAACTGGTTAAGATGGAGAGCGACCCTGAAGGTGGTGCGGGG 
TCGGGAGTGCGAGCCGGGGGCCCAGACTTCCTGCAGCCCAGCTCCAGGGAGAGCAACA 
TTAGGATCTACAG CG AGAG CGCCCCCTCCTGGCTGAG CAAAG ATG ACATCCGAAG AAT 
G CGACTCTTGG CGGACAGCGCAGTGGCAGGG CTCCGG CCTGTGTCCTCTAGGAG CGGA 
GCCCGTTTGCTGGTGCTGGAGGGGGGCGCACCTGGCGCTGTGCTCCGCTGTGGCCCTA 
GCCCCTGTGGGCTTCTCAAGCAGCCCTTGGACATGAGTGAGGTGTTTGCCTTCCACCT 
AGACAGGATCCTGGGGCTCAACAGGACCCTGCCGTCTGTGAGCAGGAAAGCAGAGTTC 
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ATCCAAGATGGCCGCCCATGCCCCATCATTCTTTGGGATGCATCTTTATCTTCAGCAA 
GTAATGACACCCATTCTTCTGTTAAGCTCACCTGGGGAACTTATCAGCAGTTGCTGAA 
ACAGAAATG CTGGCAGAATGGCCGAGT AC CCAAGCCTGAAT CAGGTTGT ACTGAAAT A 
CATCATCATGAGTGGTCCAAGATGGCACTCTTTGATTTTTTGTTACAGATTTATAATC 
GCTTAGATACAAATTGCTGTGGATTCAGACCTCGCAAGGAAGATGCCTGTGTACAGAA 
TGGATTGAGGCCAAAATGTGATGACCAAGGTTCTGCGGCTCTAGCACACATTATCCAG 
CGAAAGCATGACCCAAGGCATTTGGTTTTTATAGACAACAAGGGTTTCTTTGACAGGA 
GTG AAG AT AACTTAAACTTCAAATTGTTAG AAGG C ATCAAAG AGTTTCCAG CTTCTGC 
AGTTTCTGTTTTGAAGAGCCAGCACTTACGGCAGAAACTTCTTCAGTCTCTGTTTCTT 
GATAAAGTGTATTGGGAAAGTCAAGGAGGTAGACAAGGAATTGAAAAGCTTATCGATG 
TAATAGAACACAGAGCCAAAATTCTTATCACCTATATCAATGCACACGGGGTCAAAGT 
ATTACCTATGAATGAATGACAAAAGAATCTTCTGGCTAGGGTGTTAGATATATTTATG 
CATTTTTGGTTTTGTTTTTAAATCAAGCACATCAACCTCAAGCCCGTTTAGCAATGAG 




ORF Start: ATG at 90 


ORF Stop: TGAat 1641 




SEQIDNO: 132 


517 aa 


MWat57179.9kD 


NOV41b, 

CG578 14-02 Protein Sequence 


MTCPDKPGQLINWFICSliCVPRVRKLWSSRRPRTRRNLLLGTACAIYLGFLVSQVGRA 
SLQHGQAAEKGPHRSRDTAEPSFPEIPLDGTLAPPESQGNGSTLQPNWYITLRSKRS 
KPANIRGTVKPKRRKKHAVASAAQGQEALVGPSLQPQEAAREADAVALGTLRSKLVKM 
ESDPEGGAGSGVRAGGPDFLQPSSRESNIRIYSESAPSWLSKDDIRRMRLLADSAVAG 
LRPVSSRSGARLLVLEGGAPGAVLRCG PS PCGLLKQPLDMS EVFAFHLDRI U3LNRTL 
PSVSRKAEFIQDGRPCPIILWDASLSSASNDTHSSVKIiTWGTYQQLLKQKCWQNGRVP 
KPESGCTEIHHHEWSKMALFDFLLQIYNRLDTNCCGFRPRKEDACVQNGLRPKCDDQG 
SAALAHIIQRKHDPRHLVFIDNKGFFDRSEDNLNFKLLEGIKEFPASAVSVLKSQHLR 
QKLLQSLFLDKVYWESQGGRQGIEKLIDVIEHRAKILITYINAHGVKVLPMNE 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 41B. 



Table 41B. Comparison of NOV41 a against NOV41b. 


Protein Sequence 


NOV41a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV41b 


1..519 
1..517 


493/519 (94%) 
497/519(94%) 



Further analysis of the NOV41a protein yielded the following properties shown in 
Table 41C. 



Table 41 C. Protein Sequence Properties NOV41a 


PSort 
analysis: 


0.5500 probability located in endoplasmic reticulum (membrane); 0.2404 
probability located in lysosome (lumen); 0.1000 probability located in 
endoplasmic reticulum (lumen); 0.1000 probability located in outside 


SignalP 
analysis: 


Likely cleavage site between residues 59 and 60 



A search of the NOV4 la protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 41D. 
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Table 41 D. Geneseq Results for NOV41a 


Geneseq 
Identifler 


Protein/Organism/Length [Patent 
#, Date] 


NOV41a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAU12276 


Human PRO6001 polypeptide 
sequence - Homo sapiens, 519 aa. 
[WO200140466-A2, 07-JUN-2001] 


1..519 
1.519 


518/519(99%) 
519/519(99%) 


0.0 


AAM39125 


Human polypeptide SEQ ID NO 
2270 - Homo sapiens, 519 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..519 
1..519 


518/519(99%) 
519/519(99%) 


0.0 


AAM40911 


Human polypeptide SEQ ID NO 
5842 - Homo sapiens, 537 aa. 
[WO200153312-A1, 26-JUL-2001] 


1.519 
19..537 


491/527(93%) 
495/527(93%) ; 


0.0 


AAM41373 


Human polypeptide SEQ ID NO 
6304 - Homo sapiens, 479 aa. 
[WO200153312-A1.26-JUL-2001] ! 


212..512 
161..471 


130/316(41%) 
180/316(56%) ; 


le-64 


AAM39587 


Human polypeptide SEQ ID NO 
2732 - Homo sapiens, 397 aa. 
[WO200153312-A1, 26-JUL-2001] 


212..512 
79..389 


130/316(41%) 
180/316(56%) 


le-64 



In a BLAST search of public sequence databases, the NOV41a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 4 IE. 



Table 41E. Public BLASTP Results for NOV41a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV41a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9ET25 


HYPOTHETICAL BASIC 
PROTEIN 1-19 - Mus musculus 
(Mouse), 517 aa. 


1..519 
1..517 


431/519(83%) 
462/519(88%) 


0.0 


Q9NYZ0 


AD021 PROTEIN - Homo sapiens 
(Human), 246 aa. 


274..519 
1..246 


246/246 (100%) 
246/246 (100%) 


e-145 


Q9UFP1 


HYPOTHETICAL 49.5 KDA 
PROTEIN - Homo sapiens 
(Human), 448 aa (fragment). 


212..512 
130..440 


129/316(40%) 
179/316 (55%) 


2e-63 



PFam analysis predicts that the NOV41a protein contains the domains shown in the 
Table 41F. 
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Table 41F. Domain Analysis of NOV41 a 


Pfam Domain 


NOV41a Match Region 


Identities/ ] 
Similarities ] Expect Value 
for the Matched Region 1 


SQS_PSY: domain 1 of 1 


109.. 145 


8/37 (22%) 9.9 
29/37 (78%) j 



Example 42. 

The NOV42 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 42A. 



Table 42A. NOV42 Sequence Analysis 




SEQIDNO: 133 


1294 bp 


NOV42a, 

CG59327-01 DNA Sequence 


GATOGCCACCTTGAACGTTGTACTGATGTTGATGCCCCTTGCCCAGTACATTTTCCAT 
TGTTTTATAACTGTGCTACTGAAGTACTTGTGTGCAGAGTATGGCTGGAGGAATGCCA 
TGTTGATCCAAGGCGCCGTTTCCTTAAACCTGTTTGTTTTTGGGACCCTCATGAGGCC 
CCTCCCTCCTGGGAAAAACCCAAATGAC CCAGAAGAGAAAGATCTGCG CGTCCTG CCC 
GCGCACTCCACAGAGTCTGTAATGTCAAATGGACAGCAGGGAAGAATAGAAGAGAAGG 
ATGGCGGGTCTGGGAACGAGGAGACCCTCTGTGACCTGCAAGCCCAGGAGTGCAAGCC 
CAGGAGTGCCCCGATCAGGCCAGATCATGTGCGCTTTCCGGTTCTGAAGACGGTCAGC 
TGG CTCATTATG AGAGTCAAG AAGGGCTTCGAGGATTGGTACTCAGG CTATTTTGGG A 
CAGCCAGCCTATT^ACAAATCGAATGTTTGTAGCCTTTGTTTTCTGGGCTTCATTTGC 
ATACAGCAGCTTTGTCATCTCCTT^ATTCATCTCCCAGAAATGGTCAATTTGTATAAC 
TT ATTGGAG CAAA (TG AAGGTTTT CC CTCTG ACTTC AATTAT AG C AAT AGTTCA CAT TG 
TTGGAAAAGTGATCCTGGGCGTCATAGCTGACTTACCTTGCATCAGTGTTTGGAATGT 
CTTCCTGTTX^CCAGCTTCGTTCTTGTCCTCAGTAT^TTTGTTTTCCTGCCTTTGATG 
C^TATGTACGCTGGCCTGGTGGTCATCTGCACACTGACAGGGTTTTCCAGCGGTTATT 
TCTC CCTAATG CCCATAGTGACTG AAG ACTTGGTTGGCATTG AACATTTGGCCAATG C 
CTACGGCATCATCATCTGTG CTAATGGCATCTCTG CGTTGTTGGGACCACCTTTTGCA 
GGTAAACTGTCTGAGGTTTTAAGAGTTCATAGTGCATATAGATACGGTGTGTTAGCTC 
TGCGAGGAGACGGATGCAGAGCACTCACATCTTCTCTTATACATAGAAGTGAAATGGC 
TTTCT AAAGTT AGATCACTGGCCAG AGTTTTTGAGTCACAAGAG CT ATTC CACAGATT 


TCCTTTAGAAAAACAAT CACCACTGGCAGT CCACTT CAGTGACACAG AATGGGTTGCA 


GAACTTGCTTACTT ATGTG ACACATTCAACCTG CTCAATGAACTC AATCTGT CACTTC 


AGGGG AG AAGG ACAACTGTG TT CAAGTCAG CAAAT AAAGTG G CT A CATTCAAAAC CAA 


ACTGGAATTACGGGGGTG 




ORF Start: ATG at 2 


ORF Stop: TAA at 1049 




SEQIDNO: 134 


349 aa MW at 38694.2kD 


NOV42a, 

CG59327-01 Protein Sequence 


MATLNVVLMLMPIJ^QYIFHCFITVLLK^ 

LP PGKN PND P E E KDLRVL P AHST ES VM SNGQQGR I E EKDGG SGNE ET LCDLQ AQECKP 
RSAPIRPDHVRFPVLKTVSWLIMRVKKGFEDWYSGYFGTASLFTNRMFVAFVFWASFA 
YSSFVISFIHLPEIVNLYNLLEQTKVFPLTSIIAIVHIVGKVILGVIADLPCISVWNV 
FLLASFVLVLSIFVLLPLMHMYAGLWICTLTGFSSGYFSLMPIVTEDLVGIEHLANA 
YG 1 1 1 CANG I S ALLG P P FAG KL S E VLR VHS AYR YGVLALRG DGCRALTSSLIHRS EMA 
F 



Further analysis of the NOV42a protein yielded the following properties shown in 
Table 42B. 



Table 42B. Protein Sequence Properties NOV42a 
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analysis: 


probability located in plasma membrane; 0.4600 probability located in Golgi 
body; 0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Likely cleavage site between residues 32 and 33 



A search of the NOV42a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 42C. 



Table 42C. Geneseq Results for NOV42a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV42a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAO07132 


Human polypeptide SEQ ID NO 21024 
- Homo sapiens, 107 aa. 
[WO200164835-A2, 07-SEP-2001] 


257.-33 1 
5..81 


49/77 (63%) 
58/77 (74%) 


6e-20 


AAY31642 


Human transport-associated protein-4 
(TRANP-4) - Homo sapiens, 465 aa. 
[W09941373-A2, 19-AUG-1999] 


157..342 
221. .401 


54/197 (27%) 
86/197 (43%) 


le-07 


AAY02737 


Human secreted protein encoded by 
gene 88 clone HKAFB88 - Homo 
sapiens, 229 aa. [WO9902546-A1, 21- 
JAN-1999] 


198..342 
24.. 164 


41/147 (27%) 
65/147 (43%) 


9e-06 



In a BLAST search of public sequence databases, the NOV42a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 42D. 



Table 42D. Public BLASTP Results for NOV42a 


Protein 
Accession 
Number 


Protein/Organ ism/Length 


NOV42a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96NI7 


CDNA FLJ30794 FIS, CLONE 
FEBRA2001093, WEAKLY 
SIMILAR TO 
MONOCARBOXYLATE 
TRANSPORTER 4 - Homo sapiens 
(Human), 336 aa. 


22..331 
1..310 


250/312(80%) 
266/312(85%) 


e-138 


Q9D1K0 


1 1 1 0004H 1 0RIK PROTEIN - Mus 
musculus (Mouse), 336 aa. 


22..331 
1..310 


220/312(70%) 
250/312(79%) 


e-119 
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AAL39716 


LD30953P - Drosophila melanogaster 
(Fruit fly), 894 aa. 


142..314 
66S..843 


50/180(27%) 
89/180(48%) 


2e-15 


Q9V9B3 


CG3409 PROTEIN - Drosophila 
melanogaster (Fruit fly), 800 aa. 


142..314 
571..749 


50/180 (27%) 
89/180 (48%) 


2e-15 


Q9W0L6 


CG13907 PROTEIN - Drosophila 
melanogaster (Fruit fly), 816 aa. 


157..331 
565..738 


55/178 (30%) 
91/178(50%) ; 


le-14 



PFam analysis predicts that the NOV42a protein contains the domains shown in the 
Table 42E. 



Table 42E. Domain Analysis of NOV42a 


Pfam Domain 


NOV42a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


oxidored q3: domain 1 of 
1 


197..314 


25/177(14%) 
73/177(41%) 


9.1 



Example 43. 

The NOV43 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 43 A. 



Table 43A. NOV43 Sequence Analysis 




SEQIDNO: 135 


455 bp 


NOV43a, 

CG59494-01 DNA Sequence 


TAGAACTGTCTTGAGCTCTCACCOVTCACGATGAGCAACAAATTCTTGGGAACCTGGA 
AG CTGGT CTCCAGTG AAAACTTTGAGG ATTACATGAAAG AACTGGGAGTGAATTTCG C 
AGCCCGGAACATGGCAGGGTTAGTGAAACCGAGAGTAACTATTAGTGTTGATGGGAAA 
ATGATGACCATAAGAACAGAAAGTTCTTTCCAGGACACTAAGATCTCCTTCAAGCTGG 
GGGAAGAATTTGATGAAACTACAGCAGACAACCGGAAAGTAAAGAGCACCATAACATT 
AG AGAATGGCTCAATG ATTCACGTCC AAAAATGG CTTGGCAAAGAGACAACAAT CAAA 
AGAAAAATTGTGGATGAAAAAATGGTAGTGGAATGTAAAATGAATAATATTGTCAGCA 
C CAG AATCT ACG AAAAGGTCTG AAAAAT C ATTTC TT CATT G AAGTG G CT 




ORF Start: ATGat 31 


ORF Stop: TGA at 427 




SEQIDNO: 136 


132 aa MW at 15096.4kD 


NOV43a, 

CG59494-01 Protein Sequence 


MSNKFLGTWKLVS S EN FED YMKE LGVN FAARNMAG L VK PTVT I SVDG KMMT IRTESSF 
QDTKISFIOGEEFDETTADNRKVKSTITLENGSM IHVQKWLGKETT I KRKI VDEKMW 
ECKMNNI VSTRI Y E KV 



Further analysis of the NOV43a protein yielded the following properties shown in 
Table 43B. 



Table 43B. Protein Sequence Properties NOV43a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.1000 probability located in 
mitochondrial matrix space; 0.1000 probability located in lysosome (lumen); 
0.0053 probability located in microbody (peroxisome) 
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SignalP 
analysis: 



No Known Signal Sequence Predicted 



A search of the NOV43a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 43C. 



Table 43C. Geneseq Results for NOV43a 


Geneseq 

liiciiiiiier 


Protein/Organism/Length [Patent 
ft, uaiej 


NOV43a 
Residues/ 

lviaicn 
Residues 


Identities/ 
Similarities for 

me iviatcnea 
Region 


Expect 
value 


AAW40227 


Human myelin P2 protein - Homo 
saoiens 136 aa TWO9803647-A2 
29-JAN-1998] 


1..130 
1 130 


89/130 (68%) 
107/130 


2e-47 


AAW40228 


Bovine myelin P2 protein - Bos 
taurus, 136 aa. [WO9803647-A2, 
29-JAN-1998] j 


1..130 
1..130 


89/130(68%) 
106/130(81%) 


9e-47 


AAY90320 


Human AFABP protein sequence - 
Homo sapiens, 132 aa. 
[WO200047734-A1, 17-AUG-2000] 


1..131 
1..131 


84/131 (64%) 
110/131 (83%) 


3e-46 


AAY90319 


Mouse AFABP protein sequence - 
Mus sp, 132 aa. [WO200047734-A1, 
17-AUG-2000] 


1..131 
1..131 


83/131 (63%) 
108/131 (82%) 


7e-45 


AAG66576 


Mouse MDGI polypeptide - Mus sp, 
133 aa. [US6232291-B1, 15-MAY- 
2001] 


1..131 
1..131 


73/131 (55%) 
103/131 (77%) 


6e-40 



In a BLAST search of public sequence databases, the NOV43a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 43D. 
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Table 43D. Public BLASTP Results for NOV43a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV43a 
Residues/ 

lVTatrh 

Residues 


Identities/ 
Similarities for 

Portion 


Expect 
v diue 


MPRB2 


mvelin P2 nrotein - rahhit 1 ^7 aa 

111 Y Villi X l/l V/lwlll iQUUllj i. _/ X> CKX, 


1 H2 
1..132 


OS/1 ^9 (1\ %\ 

109/132 (81%) 




P02691 


Myelin P2 protein - Oryctolagus 
cuniculus fRarShit^ 1^1 aa 


2..132 

i 131 


94/131 (71%) 


le-48 


MPHU2 


myelin P2 protein [validated] - 
human, 132 aa. 


1.132 
1..132 


92/132 (69%) 
109/132 (81%) 


3e-48 


Q90X56 


ADIPOCYTE FATTY ACID 
BINDING PROTEIN - Gallus 
gallus (Chicken), 132 aa. 


1..131 

i„i3i ! 


86/131 (65%) 
113/131 (85%) 


le-47 


P02689 


Myelin P2 protein - Homo sapiens 
(Human), 131 aa. 


2.. 132 i 
1..131 


91/131 (69%) 
108/131(81%) 


le-47 



PFam analysis predicts that the NOV43 a protein contains the domains shown in the 
Table 43E. 



Table 43E. Domain Analysis of NOV 43a 


Pfam Domain 


NOV43a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


lipocalin: domain 1 of 1 


4.. 132 


45/157 (29%) 
113/157 (72%) 


3.2e-36 



Example 44. 

The NOV44 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 44A. 



Table 44A. NOV44 Sequence Analysis 




SEQLDNO:137 1561 bp 


NOV44a, 

CG59432-01 DNA Sequence 


AAGAATTGTAGCTCTCCACTGAATTGCAGGGGTTCTTGAATGTTGTCAACATTTGGAG 


GCAGTTGGAGGAGGGAGCTCTATTGATGAAAAATGGCTACATATTCAAAATTTCAGTG 


TATACCAGGAAGATAATTCAATTCAATCTCTGGCTTACCCAAAGAATCTTGGAGTTAC 


TGCCAATGAGGAAATCCCCAGGGTCTAATAAAAATATCTTTAGGAGTGAAGGAGTTAA 


CTGAGTGTGTAAGCTTTATCTTCTGTCCAATGGACrTGTGGTTTGCTTATAAAACTCT 


C CAGT AAATAATTGTT AGAGAC CTGTCATTGATAGCAGTTGCTAGTTGCTGCCTTTTA 


AGAGCTCGTTGATTCCTCTGCAAGGTGGTGCAGCATCCTCTGTCCCTTCATTCATTTC 


AGATCTACTCAGGTCTCCCTGTAAACAGATCTCTCGGATCAATAAGCATQAATGACGA 


AGACTACAGCACCATCTATGACACAATCCAAAATGAGAGGACGTATGAGGTTCCAGAC 
CAGCCAGAAGAAAATGAAAGTCCCCATTATGATGATGTCCATGAGTACTTAAGGCCAG 
AAAATX3ATTTATATGCCACTCAGCTGAATACCCATGAGTATGATTTTGTGTCAGTCTA 
TACCATTAAGGGTGAAGAGACCAGCTTGGCCTCTGTCCAGTCAGAAGACAGAGGCTAC 
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CTCCTGCCTGATGAGATATACTCTGAACTCCAGGAGGCTCATCCAGGTGAGCCCCAGG 
AGGACAGGGGCATCTCAATGGAAGGGTTATATTCATCAACCCAGGACCAGCAACTCTG 
CGCAGCAGAACTCCAGGAGAATGGGAGTGTGATGAAGGAAGATCTGCCTTCTCCTTCA 
AGCTTCACCATTCAGCACAGTAAGGCCTTCTCTACCACCAAGTATTCCTGCTATTCTG 
ATGCTGAAGGTTTGGAAGAAAAGGAGGGAGCTCACATGAACCCTGAGATTTACCTCTT 
TGTGAAGGTAAGGTCTGCCTCTGACAGGCATACCCTGTTCATGCAGATATTATGGCTG 
GTGTTTTATTTTGCTCTGAATGACCAGGGAAAGATTCATAATGCCATGGTCCTTGGAT 
CTCAATACATATTCAGGAGTCGGAGGGACTAAATCAGTCATTAGAGTGTACTCAGCTC 
TTCACAAAATTAGAGGAATTGGAAGGTGCATTTAAAGCACGTATTTAATCACTGACTT 


TTACATACCATGGGCAAAGTATTTTTCAAAACGGTTCACATAAGTGAGCCATAACTGC 


TGCCCAAATCCTTGCCATTGTGGCTGACATTAAGTACATTTTTCTGTCTGGTTAAATT 


TCCTTTGTCGACATGTTTAAAAGTGAAACCAAAGCTTGTGAAAGAAAGACCTTCTTGT 


GCTTCTAAGGTCACAGATTTGTCAGATAGGTGGTCAATAAAGGCTATCTCTGTCACTA 


GCTTGCCCCTTTGGCACCAATATAACTAAAAATTTGATGAAGTCAAATGATTTCAGTA 


GTAGTAAGACACTACCAGTGTTAATGTTTAATACTTACGATATCTAAACAGAA 




ORF Start: ATG at 454 


ORF Stop: TAA at 1132 




SEQIDNO: 138 


226 aa MW at 26132.2kD 


NOV44a, 

CG59432-01 Protein Sequence 


MNDEDYST I YDT I QNERTYEVPDQ PEENES PHYDD VHE YLRPENDL YATQLNTHEYDF 
VSVYTIKGEETSLASVQSEDRGYLLPDEIYSELQEAHPGEPQEDRGISMEGLYSSTQD 
QQLCAAELQENGSVMKEDLPSPSSPTIQHSKAFSTTKYSCYSDAEGLEEKEGAHMNPE 
I YL FVKVRS AS DRHTLFMQ I LWLVFY FALNDQG KI HNAMVLGSQ YI FRS RRD 




SEQIDNO: 139 


809 bp 


NOV44b, 

CG59432-02 DNA Sequence 


ATCCTCTGTCCCTTCATTCATTTCAGATCTACTCAGGTCTCCCTGTAAACAGATCTCT 


CGGATCAATAAGCATGAATGACGAAGACTACGGCACCATCTATGACACAATCCAAAAT 
GAG AGG ACGTATGAGGTTCCAGAC CAGCCAGAAGAAAATGAAAGTCC CCATTATG ATG 
ATGTCCATGAGTACTTAAGGCCAGAAAATGATTTATATGCCACTCAGCTGAATACCCA 
TGAGTATGATTTTGTGTCAGTCTATACCATTAAGGGTGAAGAGACCAGCTTGGCCTCT 
GTCCAGTCAGAAGACAGAGGCTACCTCCTGCCTGATGAGATATACTCTGAACTCCAGG 
AGGCTCATCCAGGTGAGCCCCAGGAGGACAGGGGCATCTCAATGGAAGGGTTATATTC 
ATCAACCCAGGACCAGCAACTCTGCGCAGCAGAACTCCAGGAGAATGGGAGTGTGATG 
AAGGAAGATCTGCCTTCTCCTTCAAGCTTCACCATTCAGCACAGTAAGGCCTTCTCTA 
CCACCAAGTATTCCTGCTATTCTGATGCTGAAGGTTTGGAAGAAAAGGAGGGAGCTCA 
CATGAACCCTGAGATTTACCTCTTTGTGAAGGTAAGGTCTGCCTCTGACAGGCATACC 
CTGTTCATGCAGATATTATGGCrGGTGTTTTATTTTGCTCTGAATGACCAGGGAAAGA 
TTCATAATGCCATGGTCCTTGGATCTCAATACATATTCAGGAGTCGGAGGGACTAAAT 
CAGTCATTAGAGTGTACTC AG CTCTTCACAAAATT AGAGG AATTGGAAGGTG CAT 




ORF Start: ATG at 72 


ORF Stop: TAA at 750 




SEQIDNO: 140 


226 aa |MW at 26102.2kD 


NOV44b, 

CG59432-02 Protein Sequence 


MNDEDYGTIYDTIQNERTYEVPDQPEENESPHYDDVHEYLRPENDLYATQLNTHEYDF 
VSVYTIKGEETSLASVQSEDRGYLLPDEIYSELQEAHPGEPQEDRGISMEGLYSSTQD 
QQLCAAELQENGSVMKEDLPSPSSFTIQHSKAFSTTKYSCYSDAEGLEEKEGAHMNPE 
I YLFVKVRSASDRHTLFMQI LWLVFYFALNDQGK I HNAMVLGSQ YI FRSRRD 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 44B. 



Table 44B. Comparison of NOV44a against NOV44b. 


Protein Sequence 


NOV44a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV44b 


1..226 ! 
1.226 


225/226 (99%) 
225/226 (99%) 



Further analysis of the NOV44a protein yielded the following properties shown in 
Table 44C. 



Table 44C. Protein Sequence Properties NOV44a 
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PSort 
analysis: 



0.6500 probability located in cytoplasm; 0.1000 probability located in 
mitochondrial matrix space; 0.1000 probability located in lysosome (lumen); 
0.0000 probability located in endoplasmic reticulum (membrane) 



SignalP 
analysis: 



No Known Signal Sequence Predicted 



A search of the NOV44a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 44D. 



Table 44D. Geneseq Results for NOV44a 



Geneseq 
Identifier 



Protein/Organism/Length 
[Patent #, Date] 



NOV44a 
Residues/ 

Match 
Residues 



Identities/ 
Similarities for the 
Matched Region 



Expect 
Value 



No Significant Matches Found 



In a BLAST search of public sequence databases, the NOV44a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 44E. 



Table 44E. Public BLASTP Results for NOV 44a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV44a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96JT5 


CLIC5B - Homo sapiens (Human), 
410 aa. 


1..200 
1..202 


185/202(91%) 
191/202 (93%) I 


e-104 


Q9NPY9 


DJ447E21.4 (SIMILAR TO BOVINE 
CHLORIDE CHANNEL PROTEIN 
(P64)) - Homo sapiens (Human), 180 
aa (fragment). 


1..180 
1..180 


180/180(100%) j 
180/180(100%) i 


e-103 


A47104 


chloride channel 64K chain - bovine, 
437 aa. 


1..197 
1..229 


104/231 (45%) 
133/231 (57%) 


le-39 


P35526 


Chlorine channel protein p64 - Bos 
taurus (Bovine), 437 aa. 


1..197 
1..229 


103/231 (44%) 
131/231 (56%) 


le-38 



PFam analysis predicts that the NOV44a protein contains the domains shown in the 
Table 44F. 



227 



WO 02/072757 



PCT7US02/06908 



Table 44F. Domain Analysis of NOV44a 



Pfam Domain 



NOV44a Match Region 



Identities/ 
Similarities 
for the Matched Region 



Expect Value 



No Significant Matches Found 



Example 45. 

The NOV45 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 45 A. 



Table 45A. NOV45 Sequence Analysis 




SEQIDNO: 141 


877 bp 


NOV45a, 

CG59394-01 DNA Sequence 


ACTT TG TC CT CTTGGG CTTCAC ACAGAATC CAAAGGAG CAG AAAGT ACTTTTTGTT AT 

GTTCTTGCTCTTCTACATTTTGACCATGGTGGGCAACCTGCTCATTGTAGTC 

ACTGTCAGTGAG AC CCTGGG CTCACCAATGTACTTCTTTCTTG CTGGCTTATCATTTA 

TAGATATCATTTATTCTTCATCCATTTCCCCCAGATTGATTTCAGGCTTGTTCTTTGG 

G AAT AATT CCAT AT C CTT C C AAT C TTGCATGG CCC AG CT CTTT AT CGAG C ACATTTTC 

GGTCGGTCAGAGGTCTTTCTCCTCTTGGTGATGGCCTATCACTC 

GTAAGCCCTTGCATTATTTGGTTATCATGAGACAATGGGTGTGTGTTC 

AG TGT C CTGGGTTGGAGGATTT CTG CACT CAGT ATTTCAACTT AGC ATT ATTT ATGGG 

CTCCCATTCTGTGGCCCCAATGTCATTGATCATTTTTTCTGTGACATGTATCCCTTAT 

TGAAACTCGTCTGCACTGACACCCATGCTATT'GGCCTCTTAGTGGTGGCCAATGGAGG 

ACTGXSCTTGCACrATTGTGTTT CTGCTCTTACTCAT CTCTTATGGTGTCATCTTGCAC 

TCTTTAAAGAACCTTAGTCAGAAAGGG AGG CAAAAAGCCCTCTCAACCTG CAGTTCCC 

ACATGACrGTGGTTGTCTTCTTCTTTGTTC CTTGTATTTTTATGTATGCTAGACCTG C 

TAGGACCTTCCCCATTGACAAATCAGTCAGTGTGTTTTATAC^GTCATAACCCCAATG 

CTGAAC CCCTTAAT CTACACTCTG AG AAATTCTGAG ATGACAAGTGCTATGAAGAAGC 

TTTAGAG 




ORF Start: TTT at 3 


ORF Stop: TAG at 873 




SEQIDNO: 142 


290 aa 


MW at 32485.7kD 


NOV45a, 

CG59394-01 Protein Sequence 


FVU^FTQNPKEQKVLFVMFLl.FYILTMVGNLLIV\mm/SETLGSPMYFF 

D 1 1 YS S S I S PRL I SGLF FGNNS I S FQS CMAQL F I EH I FGGS E VFLLLVMAYDCYVA I C 

KPLHYLVIMRQWVCWLLWSWVGGFLHSVFQLSIIYGLPFCGPNVIDHFFCDMYPLL 

KLVCTDTHAIGLLVVANGGLACTIVFLLLLISYGVILHSLKNLSQKGRQKALSTCSSH 

MTVWFFFVPCI FMYARPARTF PI DKS VS VFYTVITPMLNPLI YTLRNS EMTS AMKKL 



Further analysis of the NOV45a protein yielded the following properties shown in 
Table 45B. 



Table 45B. Protein Sequence Properties NOV45a 


PSort 
analysis: 


0.6400 probability located in plasma membrane; 0.4600 probability located in 
Golgi body; 0.3700 probability located in endoplasmic reticulum (membrane); 
0,1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Likely cleavage site between residues 42 and 43 



A search of the NOV45a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 45 C. 
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Table 45C. Geneseq Results for NOV45a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV45a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAU24536 


Human olfactory receptor AOLFR21 - 
Homo sapiens, 299 aa. 
[WO200168805-A2, 20-SEP-2001] 


1..290 
10..299 


273/290(94%) 
278/290 (95%) 


e-155 


AAG71950 


Human olfactory receptor polypeptide, 
SEQ ID NO: 1631 - Homo sapiens, 
299 aa. [WO200127158-A2, 19-APR- 
2001] 


1..290 
10..299 


273/290(94%) 
278/290(95%) 


e-155 


AAG72258 


Human olfactory receptor polypeptide, 
SEQ ID NO: 1 939 - Homo sapiens, 
262 aa. [WO200127158-A2, 19-APR- 
2001] 


33..290 
1..250 


234/258(90%) i 
240/258(92%) ! 


e-131 


AAG72553 


Human OR-like polypeptide query 
sequence, SEQ ID NO: 2234 - Homo 
sapiens, 327 aa. [WO200127158-A2, 
19-APR-2001] 


1..290 
10..299 


198/290(68%) 1 
242/290(83%) 


e-121 


AAG71909 


Human olfactory receptor polypeptide, 
SEQ ID NO: 1590 - Homo sapiens, 
327 aa. [WO200127158-A2, 19-APR- 
2001] 


1..290 
10..299 


198/290(68%) 
242/290(83%) ; 


e-121 



In a BLAST search of public sequence databases, the NOV45a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 45D. 



Table 45D. Public BLASTP Results for NOV45a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV45a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9QW37 


OR180DORANT RECEPTOR- ! 
Rattus sp, 307 aa. 


1..290 
10..299 


192/290 (66%) 
237/290 (81%) 


e-115 


Q96R66 


OLFACTORY RECEPTOR - 
Homo, sapiens (Human), 213 aa 
(fragment). 


57..269 
1..213 


198/213 (92%) 
202/213 (93%) 


e-111 


Q9R0K2 


ODORANT RECEPTOR MORI 8 - 
Mus musculus (Mouse), 308 aa. 


1..290 
10..299 


177/290 (61%) 
229/290 (78%) 


e-105 
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Q9R0K1 


ODORANT RECEPTOR A16 - 
Mus musculus (Mouse), 302 aa. 


1..290 
10..299 


171/290 (58%) 
226/290(76%) 


e-102 


CAC88333 


SEQUENCE 34 FROM PATENT i 
WOO 1 64879 - Homo sapiens 
(Human), 309 aa. 


1..290 
10..299 


167/290 (57%) 
221/290 (75%) 


5e-99 



PFam analysis predicts that the NOV45a protein contains the domains shown in the 
Table 45E. 



Table 45E. Domain Analysis of NOV45a 


Pfam Domain 


NOV4Sa Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


7tm_l : domain 1 of 1 


30..276 


50/268 (19%) 
174/268 (65%) 


4.4e-23 



Example 46. 

The NOV46 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 46A. 



Table 46A. NOV46 Sequence Analysis 




SEQ ID NO: 143 


1746 bp 


NOV46a, 

CG59383-01 DNA Sequence 


ATAATTCAGTTTGAAAACCAGTCGTTTCTCTTTCCTTCCCTATAGGTGTAAAGAATAT 


CCAGCTGGTGGCTACAGTTCCCCCTCTGGTTTTGCTGCCATGCATCCrGGGCGAACTA 


CTGGTAAAGGGCCCTCTACTCACACTCAGATTGACCAGCAACCTCCACGGCTTCTCAT 
TGTGCACATTGCTCTACCGTCCTGGGCTGACATCTGCACCAACCTCTGTGAGGCTCTG 
CAGAACTTCTTCTCTCTAG CCTG C AGCTTGATGGGCCCCAGCCGCATGTCCCTGTTCA 
GTTTATACATGGTACAAGATCAGCATCAGTGCATCCTCCCTTTTGTCCAAGTGAAAGG 
GAACTTTGCTAGGTTGCAGACCTGCATCTCAGAACTCCGCATGTTACAGAGAGAAGGG 
TGTTTCAGATCACAAGGTGCTTCTCTGCGGCTGGCAGTAGAGGATGGGCTCCAGCAAT 
T CAAAC AAT ACAG CAGACATGTG ACCACAAGGGCAGCTCTG ACCTAT ACCT CCCTGGA 
GATTACTATTCTGACTTCTCAGCCTGGAAAAGAGGTGGTCAAACAGTTGGAGGAAGGG 
TTGAAAG AT ACAG AC CT AGCCAGAGTCAGGAGGTTTCAGGT CG TTGAGGTCACAAAGG 
GAATCCTAGAGCACGTGGACTCAGCGTCTCCTGTTGAGGATACCAGCAATGATGAGAG 
TTCTATTCTGGGAACTCACATTGACCTTCAGACTATAGACAATGATATCGTCAGCATG 
GAGATTTTCTTCAAAGCCTGGCTACATAACAGTGGAACAGACCAAGAACAAATCCATC 
TTCTT'CTTTCTTCACAGTGTTTCAGCAACATTTCCAGACCCAGAGATAATCCAATGTG 
TCTGAAATGTGATCTCCAAGAGCGACTGCTCTGCCCATCCCTACTCGCTGGCACAGCT 
GACGGCTCCTTGAGAATGGATGACCCTAAAGGAGACTTCATCACACTCTACCAGATGG 
CTTCCCAGTCATCGGCCTCTCATTACAAGCTCCAAGTGATCAAGGCTTTAAAATCTAG 
CGGGCTCTGCGAGTCATTCACATATGGACTCCCGTTCATCCTCAGACCTACAAGCTGT 
TGGCAGCTGGACTGGGATCAGCTX3GAGACAAATCAGCAACATTTCCATGCTTTGTGTC 
ACAGCCTGCTGAAAAGGGAATGGCTGCTGTTAGCCAAGGGGGAACCACCGGGCCCAGG 
ACACAGCCAGAGAATTCCTGCCAGCACCTTCTATGTGATCATGCCGTCACACTCCCTC 
ACACTG CTGGT AAAGG CGGTGGC C A CG CGGG AA CTG ATG CTGCC CAG C ACCTT C C CCC 
TGCTACCTGAGGACCCACATGATGATAGCCTTAAGAATAGCATGCTGGACAGCCTGGA 
GCTGGAGCCCACCTACAACCCCTTGCATGTTCAAAGCCACCTGTACTCACACCTGAGC 
AGCATCTATGCCAAGCCTCAGGGGCGGCTCCACCCACACTGGGAGAGCCGAGCTCCGA 
G AAAGACTGGGCAGTTG CAGACC AACCGAGCTCGAGCTACTGTGGCCCCCCTG CCTAT 
G ACTCCTGTCCCAGG CAGAGCCTCCAAG ATGCCAG CAGCCAG C AAATCTTCCT CAG AT 
GCCTTCTTCCTCCCTTCAGAGTGGGAGAAGGATCCCTCAAGGCCCTAAGTCACCAGCA 
CCAGAGCCCAGCTGCCCAGCTTAACCATATCCATGCTCAGGTTCACATAATGGCTATC 


TGTGGT 




ORE Start: ATG at 98 


ORF Stop: TAA at 1670 




SEQ ID NO: 144 


524 aa MW at 58691. 3kD 
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NOV46a, 

CG59383-01 Protein Sequence 


MHPGRTTGKG PSTHTQI DQQPPRLLI VH I ALPS WAD I CTNLCEALQNFFS LACSLMGP 
S RMS LF S L YMVQDQHEC I L P FVQ VKGN F ARLQTC I S E LRMLQREGC FRSQG AS LRLAV 
ErXSLQQFKQYSRHVTTRAALTYTSLEITILTSQPGKEVVKQLEEGLKDTDLARVRRFQ 
WE VTKGI LEHVDSAS PVEDTSNDE SS I LGTD I DLQTI DNDI VSME I FFKAWLHNSGT 
DQEQIHLLLSSQCFSNISRPRDNPMCLKCDLQERLLCPSLLAGTADGSLRMDDPKGDF 
I T LYQMASQ S S ASHYKLQ VI KALKS S G LCES LTYGL P F I LRPTS CWQLDWD ELETNQQ 
HFHALCHSLLKREWLLLAKGE P PG PGHSQRI PASTFYVI MPSHS LTLLVKAVATRELM 
LPSTFPLLPEDPHDDSLKNSMLDSLELEPTYNPLHVQSHLYSHLSSIYAKPQGRLHPH 
WESRAPRKTGQLQTNRARATVAPLPMTPVPGRASKMPAASKSSSDAFFLPSEWEKDPS 
RP 




SEQIDNO: 145 


1647 bp 


NOV46b, 

CG59383-02 DNA Sequence 


AAAGAATATCCAGCTGGTGGCTACAGTTCCCCCTCTGGTTTTGCTGCCATGCATCCTG 


GGCGAACTACTGGTAAAGGGCCCTCTACTCACACTCAGATTGACCAGCAACCTCCACG 
GCTTCTCATTGTGCACATTGCTCTACCGTCCTGGGCTGACATCTGCACCAACCTCTGT 
GAGGCTCTGCAGAACTTCTTCTCTCTAGCCTGCAGCTTGATGGGCCCCAGCCGCATGT 
CCCTGTTCAGTTT ATACATGGTACAAGATCAG C ATGAGTG CATCCTCCCTTTTGTG CA 
AGTGAAAGGG AACTTTG CTAGGTTG CAGAC CTGCATCTCAGAACTCCGCATGTTACAG 
AGAGAAGGGTGTTTCAGATCACAAGGTGCTTCTCTGCGGCTGGCAGTAGAGGATGGGC 
TCCAGCAATTCAAACAATAC AGCAGACATGTGAC CACAAGGGCAGCTCTGAC CT ATAC 
CTCCCTGGAGATTACTATTCTGACTTCTCAGCCTGGAAAAGAGGTGGTCAAACAGTTG 
GAGGAAGGGTTGAAAGATACAGACCTAGCCAGAGTCAGGAGGTTTCAGGTCGTTGAGG 
TCACAAAGGGAATCCTAGAGCACGTGGACTCAGCGTCTCCTGTTGAGGATACCAGCAA 
TGATGAGAGTTCTATTCTGGGAACTGACATTGACCTTCAGACTATAGACAATGATATC 
GTCAGCATGGAGATTTTCTTCAAAGCCTGGCTACATAACAGTGGAACAGACCAGGAAC 
AAATCCATCTTCTTCTTTCTTCACAGTGTTTCAGCAACATTTCCAGACCCAGAGATAA 
TCCAATGTGTCTGAAATGTGATCTCCAAGAGCGACTGCTCTGCCCATCCCTACTCGCT 
GGCACAGCTGACGGCTCCTTGAGAATGGATGACCCTAAAGGAGACTTCATCACACTCC 
ACCAGATGGCTTCCCAGTCATCGGCCTCTCATTACAAGCTCCAAGTGATCAAGGCTTT 
AAAATCTAGCGGGGTCTGCGAGTCATTGACATATGGACTCCCGTTCATCCTCAGACCT 
ACAAGCTGTTGGCAG CTGG ACTGGG ATGAGCTGGAG ACAAATCAG CAACATTTCCATG 
CTTTGTGTCACAGCCTGCTGAAAAGGGAATGGCTGCTGTTAGCCAAGGGGGAACCACC 
GGGCCCAGGACACAGCCAGAGAATTCCTGCCAGCACCTTCTATGTGATCATGCCGTCA 
CACTCCCTCACACTGCTGGTAAAGGCGGTGGCCACGCGGGAACTGATGCTGCCCAGCA 
CCTTCCCCCTGCTGCCTGAGGACCCACATGATGATAGCCTTAAGAATGTGGAGAGCAT 
GCTGGACAGCCTGGAGCTGGAGCCCACCTACAACCCCTTGCATGTTCAAAGCCACCTG 
TACTCACACCTGAGCAGCATCTATGCCAAGCCTCAGGGGCGGCTCCACCCACACTGGG 
AGAGC CGAGCTCCGAGAAAGCATCCCTGCAAG ACTGGG CAGTTGCAG ACCAACCGAG C 
TCGAG CT ACTGTGGCCCCCCTGCCTATGACTC CTGTCC CAGGCAG AGCCTCCAAGATG 
CCAGCAGCCAGC^AATCTTCCTCAGATGCCTTCTTCCTGCCTTCAGAGTGGGAGAAGG 
ATCCCTCAAGGCCCTAAGTCACC 




ORF Start: ATG at 49 


ORF Stop: TAAat 1639 




SEQIDNO: 146 


530 aa 


MW at 59359. lkD 


NOV46b, 

CG59383-02 Protein Sequence 


MHPGRTTGKGPSTHTQIDQQPPRLLIVHIALPSWAD I CTNLCEALQNFFS LACSLMGP 
SRMSLFS LYMVQDQHECI LPFVQVKGNF ARLQTC I S ELRMLQREGCFRSQGAS LRLAV 
EDGLQQFKQYSRHVTTRAALTYTSLE I T I LTSQPGKE WKQLEEGLKDTDLARVRRFQ 
WEVT KG I LEHVDSAS PVEDTSNDES SI LGTD I DLQT I DNDIVS ME I FFKAWLHNSGT 
DQEQIHLLLSSQCFSNISRPRDNPMCLKCDLQERLLCPSLLAGTADGSLRMDDPKGDF 
ITLHQMASQSSASHYKLQVIKALKSSGLCESLTYGLPFILRPTSCWQLDWDELETNQQ 
HFHALCHSLLKREWLLLAKGE PPG PGHSQRI PASTFYVI MPSHS LTLLVKAVATRELM 
LPSTFPLLPEDPHDDSLKNVESMLDSLELEPTYNPLHVQSHLYSHLSSIYAKPQGRLH 
PHWESRAPRKHPCKTGQLQTNRARATVAPLPMTPVPGRASKMPAASKSSSDAFFLPSE 
WEKDPSRP 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 46B. 



Table 46B. Comparison of NOV46a against NOV46b. 


Protein Sequence 


NOV46a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV46b 


1..524 
1..530 


509/530 (96%) 
510/530(96%) 
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Further analysis of the NOV46a protein yielded the following properties shown in 
Table 46C. 



Table 46C. Protein Sequence Properties NOV46a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 03000 probability located in microbody 
(peroxisome); 0.1 000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV46a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 46D. 



Table 46D. Geneseq Results for NOV46a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV46a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM34317 


Peptide #8354 encoded by probe for 
measuring placental gene expression - 
Homo sapiens, 52 aa. [WO2001 57272- 
A2, 09-AUG-2001] 


259..310 
1..52 


52/52(100%) 
52/52(100%) j 


7e-23 


ABB 18624 


Protein #623 encoded by probe for 
measuring heart cell gene expression - 
Homo sapiens, 42 aa. [WO2001 57274- 
A2, 09-AUG-2001] 


101..142 
1..42 


42/42(100%) 
42/42(100%) ! 


2e-16 


AAM66343 


Human bone marrow expressed probe 
encoded protein SEQ ID NO: 26649 - 
Homo sapiens, 42 aa. [WO200157276- 
A2, 09-AUG-2001] 


101.142 
1..42 


42/42(100%) \ 
42/42(100%) j 


2e-16 


AAM53955 


Human brain expressed single exon 
probe encoded protein SEQ ED NO: 
26060 - Homo sapiens, 42 aa. 
[WO200157275-A2, 09-AUG-2001] 


101..142 
1..42 


42/42(100%) 
42/42(100%) 


2e-16 


AAM26622 


Peptide #659 encoded by probe for 
measuring placental gene expression - 
Homo sapiens, 42 aa. [WO2001 57272- 
A2, 09-AUG-2001] 


101.142 
1..42 


42/42(100%) \ 
42/42(100%) 


2e-16 
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In a BLAST search of public sequence databases, the NOV46a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 46E. 



Table 46E. Public BLASTP Results for NOV46a 



Protein 
Accession 
Number 


Protein/Organism/Length 


NOV46a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9Z0E1 


D6MM5E PROTEIN - Mus 
musculus (Mouse), 529 aa. 


1..524 
1..526 


380/526 (72%) 
423/526(80%) 


0.0 


Q96L07 


SIMILAR TO DNA SEGMENT, 
CHR 6, MIRIAM MEISLER 5, 
EXPRESSED - Homo sapiens 
(Human), 365 aa. 


1..358 
1..358 


358/358(100%) 
358/358(100%) 


0.0 



PFam analysis predicts that the NOV46a protein contains the domains shown in the 
Table 46F. 



Table 46F. Domain Analysis of NOV46a 


Pfam Domain 


NOV46a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


RA: domain 1 of 1 


124..214 


18/115(16%) 
65/115(57%) 


8.4 



Example 47. 



The NOV47 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 47A. 



Table 47A. NOV47 Sequence Analysis 




SEQ ID NO: 147 960 bp 


NOV47a, 

CG58526-01 DNA Sequence 


AGGACTAAATAAAATGGCCTAAAT^TAAATATOGAT^GGGAT^CCAT^CTCTTGCAG 

ATGCCCAGAACCAAAGAAGAGGTCTGCCTGGTTTTCTTCCTGGAGCTCCAGACCCAGA 

CCAAAGCCTTCCTGCCTCTTCCAATCCAGGGAACCAAGCATGGCAGCTGAGTCTCCCT 

CTGCCAAGCAGTTTCCTGCCAACAGTCAGTCTCCCTCCTGGTCTAGAATATTTAAGCC 

AGTTAGACCTGATAATTATACACCAGCAGGTGGAGCTGCTTGTGATACTTGGTACTGA 

GACCTCCAACAAATATCAGATTAAAAACAGCTTGGGACAAAGAAT 

GAGGAAAGCATCTGCTTCAATCGT ACTTTCTGTTC CACTCTGCGATCTTGCACCCTGA 

GGATCACAGATAACTCAGGTCGAGAGGTCATTACAGTGAACAGGCCCTTGAGATGTAA 

CAGCTGCTGGTGCCCTTGCTACCTACAAGAGTTAGAAATCCAAGCCCCTCCTGGTACT 

ATAGTTGGTTACGTTACGCAGAAGTGGGACCCCTTTCTGCCTAAATTCACAATCCAAA 

ATGCAAACAAAGAAGATATTTT^AAAATTGTTGGTCCTTGTGTGACATGTGGCTCTTT 

TGGCGATGTGGATTTTGAGAAGGTGAAAACCATTAATX3AAAAGCTTACAATTGGGAAG 

ATTTCAAAGTACTGX5TCAGGATTTGTAAATGATGTCTTCACAAATGCTGACAATTTCG 

GAATTCATX3TTCCTGCAGATCTAGATCTAACAGTCAAAGCAGCAATGAT03GTGCCTC 

TTTTCTCTTTGTAAGTATX3GGCTTTGAGAGCCCAGCCCTCCAAGATGAGAAAGAGTCA 

GTGTGGCAATTCAAAAAATCAGAGTGCCCTCTCACCTCCAAACAAGCCCACTTGTTCC 
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CCAG CGATGGTTCTT AGCCAG ACTGAAATG AC 




ORF Start: ATG at 31 


ORF Stop: TAG at 943 




SEQ ID NO: 148 


304 aa MW at 33794.2kD 


NOV47a, 

CG58526-01 Protein Sequence 


MDWDFHSLADAQNQRRGLPGFLPGAPDPDQSLPASSNPGNQAWQLSLPLPSSPLPTVS 
LPPGLEYLSQLDLIIIHQQVELLVILGTETSNKYEIKNSLGQRIYFAVEESICFNRTF 
CSTLRSCTLRITDNSGREVITVNRPLRCNSCWCPCYLQELEIQAPPGTIVGYVTQKWD 
PFLPKFTIQNANKEDILKIVGPCVTCGCFGDVDFEKVKTINEKLTIGKISKYWSGFVN 
DVFTNADNFGIHVPADLDVTVKAAMIGACFLFVSMGFESPALQDEKESVWQFKKSECP 
LTSKQAHLFPSDGS 



Further analysis of the NOV47a protein yielded the following properties shown in 
Table 47B. 



Table 47B. Protein Sequence Properties NOV47a 


PSort 

analysis: 


0.8500 probability located in endoplasmic reticulum (membrane); 0.4400 
probability located in plasma membrane; 0.4244 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial inner membrane 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV47a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 47C. 



Table 47C. Geneseq Results for NOV47a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV47a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG78341 


Human Mm-1 cell line derived 
transplantability-associated gene lb - 
Homo sapiens, 318 aa. 
[WO200164894-A2, 07-SEP-2001] 


24..282 
60..318 


152/263 (57%) 
187/263 (70%) 


5e-84 


AAB24113 


Human phospholipid scramblase 
HPLS protein sequence - Homo 
sapiens, 318 aa. [CN1 259574- A, 12- 
JUL-2000] 


24..282 
60..318 


152/263(57%) ! 
187/263(70%) I 


5e-84 


AAB24112 


Mouse phospholipid scramblase 
MPLS protein sequence - Mus sp, 318 
aa. [CN1259574-A, 12-JUL-2000] 


24..282 
60..318 


152/263 (57%) ■ 
187/263(70%) j 


5e-84 


AAY09309 


Human phospholipid scramblase - 


24..282 
60..318 


152/263(57%) i 
187/263(70%) i 


5e-84 
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A2, 22-APR-1999] 








AAY29323 


Human PL scramblase - Homo 
sapiens, 318 aa. [W09936536-A2, 22- 
JUL-1999] 


24..282 
60..318 


152/263 (57%) 
187/263 (70%) | 


5e-84 



In a BLAST search of public sequence databases, the NOV47a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 47D. 



Table 47D. Public BLASTP Results for NOV47a 


rroiem 
Accession 
Number 


Protein/Organism/Length 


NOV47a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9JJ0O 


Phospholipid scramblase 1 (PL 
scramblase 1) (Transplantability 
associated protein 1) (TRA1) (NOR1) - 
Mus musculus (Mouse), 328 aa. 


20.. 283 
66..328 


150/267 (56%) 
191/267 (71%) 


4e-84 


Q99M50 


PHOSPHOLIPID SCRAMBLASE 1 - 
Mus musculus (Mouse), 327 aa. 


20. .282 ; 
66..327 : 


150/266(56%) 
191/266 (71%) 


6e-84 


015162 j 


Phospholipid scramblase 1 (PL 
scramblase 1) (Erythrocyte phospholipid 
scramblase) (Ca2+ dependent 
phospholipid scramblase 1) 
(MmTRAlb) - Homo sapiens (Human), 
318 aa. 


24..282 
60..318 


152/263 (57%) 
187/263 (70%) 


2e-83 


P58195 


Phospholipid scramblase 1 (PL 
scramblase 1) (Ca2+ dependent 
phospholipid scramblase 1) - Rattus 
norvegicus (Rat), 335 aa. 


28..282 
84..33S j 


145/256 (56%) 
183/256(70%) 


3e-81 


Q9NRY7 


Phospholipid scramblase 2 (PL 
scramblase 2) (Ca2+ dependent 
phospholipid scramblase 2) - Homo 
sapiens (Human), 224 aa. 


55..270 
6..221 


135/217(62%) ! 
164/217 (75%) 


le-75 



PFam analysis predicts that the NOV47a protein contains the domains shown in the 
Table 47E. 
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Table 47E. Domain Analysis of NOV47a 



Pfam Domain 



NOV47a Match Region 



Identities/ 
Similarities 
for the Matched Region 



Expect Value 



No Significant Matches Found 



Example 48. 

The NOV48 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 48A. 



Table 48A. NOV48 Sequence Analysis 




SEQIDNO: 149 |957 bp 


NOV48a, 

CG57851-01 DNA Sequence 


CCCCTGCTGGTGCCCAAGACCACCGTGGAAGGAATGGCTAAAGAGGAGACAAGTGAGT 


TAGAATGGGGCTTGTTACCCCCAGAAGAATTTTCCCAAGTGAATGGAATCATTCTTCA 
AAAG AAAATGTGCG ATTT CTGGG AT AAG AT CTGG AACTT C CAAG C CAAG C CTG ATGAC 
CTGCTCATTGCTTCTTACCCCAAAGCAGGTACCACTTGGACACAGGAAATTGTAGATC 
TGATACAAAATG ATGG CG ATATTG AG AAAAGCAGG CGCG CTTCCATTCAACTTC AACA 
CCCTrTCCTGGAGTGGATAAGAATGACACACGCCAGGAAAATTTTTGCAGGGATTGAC 
CAGGCTAACACAATGCCTTCCCCAAGGACCCTGAAAACTCATCTTCCTGTACAACTAC 
TG C CT CCATCCTTCTGGG AGGAAAACTGT AAG ATAATCT ATGTGGCAAGAAATGCC AA 
GGATAACCTGGTGTCCTACTACCATTTTCAAAGGATGAGCAAAGCACTCCCTGACGTT 
TTGACAGTGGGAGAATACATTATGTGTGGGGAAGTGTTGTGGGGAATATGGGAAGAGA 
TT CGG ACTTGGCAACTG CAT AG GTTG TT CTG CTGG TT CTTTG AT C ATG CTT CTGAG AA 
TCCTAGAAAGTTCAAAAGGATAATGGAATTTATGGGGAATAAACTAGATGAAGATCCT 
GTCAAAAGAATTGTTCAGCACACATCTTTTGAAAGTAAGAAGAAAAACCAGATGACCA 
ACT ATG T AATG AT AA CCTG TG ACATC ATGGACCACT C C ATCTC C C CATTT ATGAGG AA 
AGGGACCGTTGGAGAGTGGAAGGATTACTTCTCAGCAGCACAGAATAAGAGATTTGAT 
GAAGACAGGAAAATGGCTGACTCTTCTCTGACCTTCCACACGGAGCTCTAAAGAGAGA 
GAGACAAAGTCTATACTACACAGGGGCAC 




ORF Start: ATG at 34 


ORF Stop: TAA at 919 




SEQIDNO: 150 


295 aa 


MW at 34853.7kD 


NOV48a, 

CG57851-01 Protein Sequence 


MAKEETSELEWGLLPPEEFSQVNGIILQKKMCDFWDKIWNFQAKPDDLLIASYPKAGT 
TWTQE I VDLI QNDGDI EKS RRAS I QLQHPFLEWI RMTHARKI FAG IDQANTMPS PRTL 
KTHLPVQLLPPSFWEENCKIIYVARNAKDNLVSYYHFQRMSKALPDVLTVGEYIMCGE 
VLWGI WE E I RTWQLHRL FCWFFDHAS EN PRKFKR I ME FMGNKLD ED P VKR I VQHTS FE 
SKKKNQNn'NYVMITCDIMDHSISPFMRKGTVGEWKDYFSAAQNKRFDEDRKMADSSLT 
FHTEL 



Further analysis of the NOV48a protein yielded the following properties shown in 
Table 48B. 



Table 48B. Protein Sequence Properties NOV48a 


PSort 
analysis: 


0.6400 probability located in microbody (peroxisome); 0.4500 probability 
located in cytoplasm; 0.1000 probability located in mitochondrial matrix space; 
0.1000 probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV48a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 48C. 



Table 48C. Geneseq Results for NOV48a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV48a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAE12209 


Human ST drug-metabolising protein 
2 encoded by DNA transcript 2 - 
Homo sapiens, 304 aa. 
[WO200172977-A2, 04-OCT-2001] 


16..295 
15..304 


137/293(46%) 
200/293 (67%) 


9e-74 


AAE12210 


Human ST drug-metabolising protein 
3 encoded by cDNA - Homo sapiens, 
304 aa. [WO200172977-A2, 04-OCT- 
2001] 


16..295 i 
15..304 | 


129/293 (44%) 
190/293(64%) 


le-67 


AAE12208 


Human ST drug-metabolising protein 
1 encoded by DNA transcript 1 - 
Homo sapiens, 304 aa. 
[WO200172977-A2, 04-OCT-2001] 


16..295 1 
15..304 \ 


128/293 (43%) 
190/293 (64%) 


6e-67 


AAE05178 


Human drug metabolising enzyme , 
(DME-9) protein - Homo sapiens, 304 
aa. [WO200151638-A2, 19-JUL-2001] 


16..295 \ 
15..304 j 


128/293 (43%) 
189/293 (63%) 


le-66 


AAY67294 


Human STP2 (phenol 
sulphotransferase 2) amino acid 
sequence - Homo sapiens, 295 aa. 
[WO9964630-A1, 16-DEC-1999] 


15..295 ! 
10..295 


133/292 (45%) 
186/292(63%) 


5e-66 



In a BLAST search of public sequence databases, the NOV48a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 48D. 



Table 48D. Public BLASTP Results for NOV48a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV48a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q90WR6 


SULFOTRANSFERASE 1C - Gallus 
gallus (Chicken), 307 aa. 


3..29S 
5..307 


170/304(55%) i 
218/304(70%) | 


3e-94 


P50237 








3e-92 
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(EC 2.8.2.-) (HAST-I) - Rattus 
norvegicus (Rat), 304 aa. 


1..304 


. 222/308(71%) 




O70262 


PHENOL SULFOTRANSFERASE - 
Mus musculus (Mouse), 304 aa. 


18..295 
19..304 


164/289(56%) ; 
215/289(73%) \ 


le-91 


075897 


Sulfotransferase IC2 (EC 2.8.2.-) 
(SULT1C) (SULT1C#2) - Homo 
sapiens (Human), 302 aa. 


22.. 292 
22..299 


160/282(56%) \ 
203/282(71%) \ 


le-87 




auitotransierase 1C1 (cc 2.8.2.-) 
(SULT1C#1) (ST1C2) 
(humSULTC2) - Homo sapiens 
(Human), 296 aa. 


1 O ^Af 

18. .295 
12.. 296 


149/289(51%) i 
201/289(68%) ! 


le-80 



PFam analysis predicts that the NOV48a protein contains the domains shown in the 
Table 48E. 



Table 48E. Domain Analysis of NOV48a 


Pfam Domain 


NOV48a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


Sulfotransfer: domain 1 of 
1 


23..28S 


116/298(39%) 
207/298 (69%) 


6.2e-82 



Example 49. 



The NOV49 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 49A. 



Table 49A. NOV49 Sequence Analysis 




SEQIDNO: 151 


1934 bp 


NOV49a, 

CG59377-01 DNA Sequence 


CTTGATTACGG AG ACTG AAC CTTCAT AGGGTGCG C ACTT ACCAAGG ACAGGAAGGTTT 


CTCTGTTTGAAGGGCTTTAAACTTATAACAAAGAAAATAAAAATGACGACTTCGTCTA 


TCAGACGGCAGATCAAAAACATCGTGAACAATTACTCAGAGGCAGAAATCAAAGTCCG 
GGAAGCCACCTCCAATGACCCGTGGGGCCCGTCCAGTTCTCTGATGACCGAGATTGCC 
GACCTG ACCTACAACGTGGTGG CCTTCTCGG AGATCATGAG CATGGTGTGG AAGCGGC 
TG AATG ACCATGG CAAG AACTGGCGGCATGTGTACAAGGCG CTG ACCCTGCTGG ACTA 
CCTCATCAAGACAGGCTCCGAACGTGTGGCCCAGCAGTGCCGGGAGAACATCTTCGCC 
ATCCAGACCCTGAAGGACTTCCAGTACATTGACCGAGATGGCAAGGACCAGGGCATCA 
ATGTGCGTGAGAAGTCAAAGCAACTGGTGGCTCTCCTCAAGGACGAGGAACGGTTGAA 
GGCTGAGAGGGCCCAGGCTCTCAAAACCAAAGAGCGCATGGCCCAGGTTGCCACTGGC 
ATGGGCAGCAACCAGATCACCTTTGGGCGAGGCTCCAGCCAGCCCAACCTCTCCACCA. 
GCCACTCGGAGCAGGAGTATGGCAAGGCCGGGGGCTCCCCGGCCTCCTACCATGGCTC 
CACCTCCCCGCGAGTGTCCTCCGAGCTGGAGCAAGCCCGGCCCCAGACTAGTGGAGAA 
GAGGAGCTTCAGCTGCAGCTGGCACTTGCCATGAGCAGAGAAGTGGCTGAGCAGGAAG 
AACGCCTCAGGCGGGGTGATGACCTCAGATTACAGATGGCCCTGGAAGAAAGCCGAAG 
GGACACAGTTAAAATTCCAAAAAAGAAAGAGCAGACTACGCTGTTGGATTTAATGGAT 
GCTCTCCCCAGCTCGGGCCCCGCGGCCCAGAAAGCAGAGCCCTGGGGCCCGTCAGCCT 
CCACTAACCAGACCAACCCCTGGGGCGGGCCAGCGGCTCCTGCGAGTACTTCAGACCC 
CTGGCCATCGTTTGGTACCAAGCCAGCTGCCTCCATTGACCCATGGGGGGTGCCCACT 
GGAGCCACCGCACAATCTGTCCCCAAGAACTCGGACCCCTGGGCAGCTTCACAGCAGC 
CTGCCTCCAGTGCTGGGAAAAGAGCTTCTGACGCGTGGGGCGCAGTCTCCACCACCAA 
GCCCGTGTCTGTCTCTGGGTCCTTTGAGCTCTTCAGTAATCTGAATGGTACAATTAAA 
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GATGACTTTTCTGAATTTGACAACCTTCGGACTTCAAAAAAAACAGCCGAATCTGTGA 
CCTCTCTGCCATCCCAAAACAATGGAACTACCAGCCCTGACCCCTTTGAGTCTCAACC 
CCTGACTGTCGCCTCAAGCAAGCCCAGCAGTGCCCGGAAAACACCTGAGTCCTTCCTG 
GGCCCCAACGCGGCCCTGGTGAACCTGGACTCACTGGTGACCAGGCCTGCCCCACCAG 
CCCAGTCCCTCAACCCTTTCCTGGCACCAGGTGCTCCCGCCACCTCGGCCCCTGTTAA 
CCCTTTCCAGGTGAACCAGCCCCAGCCGCTGACACTGAACCAGCTTCGGGGGAGCCCA 
GTCCTGGGGACCAGCACATCCTTTGGGCCTGGCCCAGGAGTGGAGTCCATGGCTGTGG 
CCTCGATGACCTCCGCGGCCCCACAGCCAGCTCTGGGGGCCACTGGTTCCTCTCTGAC 
ACCACTGGGCCCTGCAATGATGAACATGGTGGGCAGTGTGGGTATACCCCCATCAGCA 
GCCCAGGCCACTGGCACAACCAACCCTTTCCTTCTCTAGTGCCTGGGCCTGGGACCCA 
CCCAGAGCACCTGTGCTGGAGGATGCCGAGCAGGGACTCTCGTCTGTGGGACGGGATC 


CAAGAGTTTGGGGATTAGGG 




ORF Start: ATG at 101 


ORF Stop: TAG at 1835 




SEQIDNO: 152 


578 aa |MW at 61651.2kD 


NOV49a, 

CG59377-01 Protein Sequence 


MTTSSIRRQMKNIVNNYSEAEIKVREATSNDPWGPSSSIWTEIADLTYNVVAPSEIMS 
MVWKRLNDHGKNWRHVYKALTLLDYLIKTGSERVAQQCRENIFAIQTLKDFQYIDRDG 
KDQGINVREKSKQLVALLKDEERLKAERAQALKTKERMAQVATGMGSNQITFGRGSSQ 
PNLSTSHSEQEYGKAGGSPASYHGSTSPRVSSELEQARPQTSGEEELQLQLALAMSRE 
VAEQEERLRRGDDLRLQMALEESRRDTVKI PKKKEQTTLLDLMDALPSSGPAAQKAEP 
WGPSASTNQTNPWGGPAAPASTSD PWPS FGTKPAAS I DPWGVPTGATAQS VPKNSD PW 
AASQQPASSAGKRASDAWGAVSTTKPVSVSGSFELFSNLNGTIKDDFSEFDNLRTSKK 
TAESVTSLPSQNNGTTSPDPFESQPLTVASSKPSSARKTPESFLGPNAALVNLDSLVT 
RPAPPAQSLNPFLAPGAPATSAPVNPFQVNQPQPLTLNQLRGSPVLGTSTSFGPGPGV 
ESMAVASMTSAAPQPALGATGSSLTPLGPAMMNMVGSVGIPPSAAQATGTTNPFLL 



Further analysis of the NOV49a protein yielded the following properties shown in 
Table 49B. 



Table 49B. Protein Sequence Properties NOV49a 


PSort 
analysis: 


0.4936 probability located in mitochondrial matrix space; 0.3000 probability 
located in nucleus; 0.2087 probability located in mitochondrial inner membrane; 
0.2087 probability located in mitochondrial intermembrane space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV49a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 49C. 



Table 49C. Geneseq Results for NOV49a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV49a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for \ 
the Matched 
Region 


Expect 
Value 


AAB93525 


Human protein sequence SEQ ID 
NO: 12872 - Homo sapiens, 584 aa. 
[EP1074617-A2, 07-FEB-2001] 


1..578 
1..584 


578/584(98%) j 
578/584(98%) j 


0.0 


AAB95663 


Human protein sequence SEQ ID 


40..403 
1..370 


364/370(98%) j 
364/370 (98%) j 


0.0 
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[EP1074617-A2, 07-FEB-2001] 

L * J 








AAB93011 


Human protein sequence SEQ ID 
NO: 1 1762 - Homo sapiens, 484 aa. 
[EP1074617-A2, 07-FEB-2001] 


1..407 
1..470 


385/470(81%) 
390/470 (82%) 


0.0 


AAB42049 


Human ORFX ORF1813 polypeptide 
sequence SEQ ID NO:3626 - Homo 
sapiens, 551 aa. [WO200058473-A2, 
05-OCT-2000] 


1..578 
1..551 


306/636(48%) 
370/636(58%) j 


e-141 


AAB95100 


Human protein sequence SEQ ID 
NO: 17064 - Homo sapiens, 576 aa. 
[EP1074617-A2, 07-FEB-2001] 


1..578 
1..576 


298/636(46%) i 
371/636(57%) j 


e-137 



In a BLAST search of public sequence databases, the NOV49a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 49D. 



Table 49D. Public BLASTP Results for NOV49a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV49a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


095207 : 


EPSIN 2A - Homo sapiens (Human), 
584 aa. 


1..578 
1..584 


576/584 (98%) 
576/584 (98%) 


0.0 


Q9UPT7 


KIAA1065 PROTEIN - Homo 
sapiens (Human), 641 aa. 


1..578 
1..641 


557/641 (86%) 
562/641 (86%) 


0.0 


095208 


EPSIN 2B - Homo sapiens (Human), 
642 aa. 


1..578 
1..642 


556/642 (86%) 
560/642 (86%) 


0.0 


Q9Z1Z3 


EH DOMAIN BINDING PROTEIN 
EPSIN 2 - Rattus norvegicus (Rat), 
583 aa. 


1..578 
1..583 


512/590(86%) 
526/590 (88%) 


0.0 


070447 


INTERSECTIN-EH BINDING 
PROTEIN IBP2 - Mus musculus 
(Mouse), 509 aa (fragment). 


76..S78 
2..509 


438/515 (85%) 
459/515 (89%) 


0.0 



PFam analysis predicts that the NOV49a protein contains the domains shown in the 
Table 49E. 



Table 49E. Domain Analysis of NO V49a 


Pfam Domain 


NOV49a Match Region 




Expect Value 
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Similarities 
for the Matched Region 




ENTH: domain 1 of 1 


17..140 


70/131 (53%) 
117/131(89%) 


7.9e-68 


VHS: domain 1 of 1 


14..158 


33/160(21%) 
90/160 (56%) 


3.3 


UIM: domain 1 of 2 


217..234 


11/18(61%) 
16/18(89%) 


0.043 


UIM: domain 2 of 2 


242..259 


5/18 (28%) 
12/18(67%) 


80 



Example 50. 

The NOV50 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 50A. 



Table 50A. NOV50 Sequence Analysis 




SEQIDNO: 153 


2580 bp 


NOV50a, 

CG59258-01 DNA Sequence 


ATGCTGCTGGCCCCCTTTTATTGCTGGGTGTGTGCCCATGCTGCTGGCCCCCTTTTAT 
TGCTGGGCAGTG ACAAACTGT AC CATCAGTGGCTCTC CACTGT CCGGAAAGGAAGTGG 
AGCAATTCTGAAT ACTGT AAAG ACCAAAG C AAATC CG GCCATG AAG ACTGT CT AC AAG 
TTCGACATTGCCGAGAATGGCTGCGCCCCCACCCCAGAAGAGCAGCTGCCAAAGACTG 
CACCGTCCCCACrGGTGGAGG CCAAGGACCC C AAGCTCCGAG AAG ACCGGCGGCCAAT 
CACAGTCCACrTTGGACAGGACCAGTCTGAGATGTCTTTCAGCTCAGCACTCACTCAC 
GGCAAAGAGAGTGCCCGGACCCAGCCX3GAGAGAGTCGTTGACAGGACTGGCGAGCCCC 
TGAATCCTGAGCGCGCTCTCTCCGGAGATCATCTCTGGCCTGTTACGCACTTGCTCTG 
GGCAACCCTGGGCAAGTCCTTGCTTGCCCTCATCTGTGAAATGGGTAGCAGCCCTCGT 
TCC CTGCAGAGGAGCCTTGCGCTGCTGGGGACACCCCAG CTT ATTTGGGAAACTGCAA 
CCACCATGGCCGATGGCCCCACCACGCCCTGTCTAGGAAGCAGAGGCCTCCCCAGCAG 
CGTGTCCACTGTGCCCCTGGCCCTGCGTGAAGTGCCATCAGATGCCCCGCATCCCTGC 
AGCAGGG CC CTCGTG ACTGGCCTCACAGATG AGG AC ACAGAGG CCCAGGGAAGTCACT 
TGCTTGCCAAAGTCACTCAGCAAACCATGTCTGTCTGGCTCTCAGAAAATGGGAAAGA 
AGCCTGGGCATTCAGCCATGAGGGAGCCACGGCTGTAGCCAGTGGAATGACGTACCCT 
CAGTCCAGGATGTGCACCCGGGCAGCCAGGTCCCACAGCCACTACTTTCTTGCCCCCA 
CCACTGCTCCCACAGTTCCCAGAACTCAGTCTCCAGATCTGGGCTCCAGGATGCAGAG 
GCTGTCCTCAGGCCTGGTAAAGCCCTTGCGACACTATGCGGTCTTCCTCTCCGAAGAC 
TCCTCTGATGATGAATGCCAGCGGGAAGAGGGCCCGAGCTCTGGCTTCACCGAGAGCT 
TTTTCTTCTCCGCTCCCTTTGAATGGCCGCAGCCGTATCGGACACTCAGGGAGTCAGA 
C AG CG CGG AAG G CG ACG AGG CAG AG AGT C CAG AG C AG CAAG TG CGG AAG T C CACAGGC 
CCTGTCCCAGCTCCCCCTGACCGGGCTGCCAGCATCGACCTTCTGGAAGACGTCTTCA 
G CAACCTGGACATGGAGGC CGCACTGCAGCCACTGGGCCAGGCCJ^GAGCTTAGAGGA 
C CTTCGTGCCCCCAAAG ACCTGAGGGAGCAGCCAGGG ACCTTTG ACTATCAGAGG CTG 
GATCTGGGCGGGAGTGAGAGGAGCCGCGGGGTGACAGTGGCCTTGAAGCTTACCCACC 
CGTACAACAAGCTCTGGAGCCTGGGCCAGGACGACATGGCCATCCCCAGCAAGCCCCC 
AGCTGCCTCCCCTGAGAAGCCCTCAGCCCTGCTCGGAAACTCCCTGGCCCTGCCTCGA 
AGGCCCCAGAACCGGGACAGCATCCTGAACCCCAGTGACAAGGAGGAGGTGCCCACCC 
CTACT CTGGGCAG CATCACCATCCCCCGG CCCCAAGG CAGGAAGACCCCAG AGCTGGG 
CATCGTGCCTCCACCGCCCATTCCCCGCCCGGCCAAGCTCCAGGCTGCCGGCGCCGCA 
CTTGGTGACGTCTCAGAGCGGCTGCAGACGGATCGGGACAGGCGAGCTGCCCTGAGTC 
CAGGGCTCCTGCCTGGTGTTGTCCCCCAAGGCCCCACTGAACTGCTCCAGCCGCTCAG 
CCCTGGCCCCGGGGCTGCAGGCACGAGCAGTGACGCCCTGCTCGCCCTCCTGGACCCG 
CTCAGCACAGCCTGGT CAGG CAG CACCCTCCCGTCACGCCCCGCCACCCCG AATGTAG 
CCACCCCATTCACCCCCCAATTCAGCrTCCCCCCTGCAGGGACACCCACCCCATTCCC 
A<^GCCACCACrCAACCCCITTGTCCCATCCATGCCAGCAGCCCCACCCACCCTGCCC 
CTGGTCTCCACACCAGCCGGGCCTTTCGGGGCCCCTCCAGCTTCCCTGGGGCCGGCTT 
TTG CG TCCGG C CTCCTG CTGTCCAGTG CTG G CTT CTGTGCCCCTCACAGGTCT CAG CC 
CAACCrCTCCGCCCTCTCC^TGCCCAACCTCTTTGGCCAGATGCCCATGGGCACCCAC 
ACGAGCCCCCTACAGCCGCTGGGTCCCCCAGCAGTTGCCCCGTCGAGGATCCGAACGT 
TGCCCCTGGCCCGCTCAAGTGCCAGGGCTGCTGAGACCAAGCAGGGGCTGGCCCTGAG 
GCCTGGAGACCCCCCGCTTCTGCCTCCCAGGCCCCCTCAAGGCCTGGAGCCAACACTG 
CAG CCCTCTGCTCCTC AACAGGCCAG AGAC CCCTTTGAGGATTTGTTACAG AAAACCA 
AGCAAGACGTGAGCCCGAGTCCGGCCCTGGCCCCGGCCCCAGACTCGGTGGAGCAGCT 
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CAGGAAGCAGTGGGAGACCTTCGAGTOA 




ORF Start: ATG at 1 


ORF Stop: TGA at 2578 




SEQIDNO: 154 


859 aa 


MWat91746.7kD 


NOV50a, 

CG59258-01 Protein Sequence 


MLLAPFYCWVCAHAAGPLLL1K3SDKLYHQWLSTVWCGSGAILNTVKTKAHPAMKTVYK 
FDIAENGCAPTPEEQLPKTAPSPLVEAKDPKLREDRRPITVHFGQDQSEMSFSSALTH 
GKESARTQPERWDRTGEPLNPERALSGDHLWPVTHLLWATLGKSLLALICEMGSSPR 

^T/'IR QT.IVT.T/JTPnT.TWPTiTTMJiFlfiDTT'Dr'T/^CP/^T DCCUCH/OT IV T DPUOCnBDUDf 1 
OijyivOU/Mjjjo x ryui nt inl i rlnUor l 1 ir^_ JLajoKuXjc'o o Vol V fJ-irU-iKEi V JJMrrl lr U 

SRALVTGLTDEDTEAQGSHLIJUCVTQQTMSVWLSENGKEAWAFSHEGATAVASGMTyP 
QSRMCTRAARSHSHYFLAPTTAPTVPRTQSPDLGSRMQRLSSGLVKPLRHYAVFLSED 
SSDDECQREEGPSSGFTESFFFSAPFEWPQPYRTLRESDSAEGDEAESPEQQVRKSTG 
P VPAPPDRAAS I DLLEDVFSNLDMEAALQPLGQAKS LEDLRAPKDLREQ PGTFDYQRL 
DLGGSERSRGVTVALKLTH PYNKLWSLGQDDMAI PS KPPAAS PEKPS ALLGNSLALPR 
RPQNRDSILNPSDKEEVPTPTLGSITIPRPQGRKTPEU3IVPPPPIPRPAKLQAAGAA 
LGDVS E RLQTDRDRRAALS PGLLPG W PQG PTELLQ P LS PG PGAAGTS SDALLALLD P 
LSTAWSGSTLPSRPATPNVATPFTPQFSFPPAGTPTPFPQPPLNPFVPSMPAAPPTLP 
LVSTPAGPFGAPPASLGPAFASGLLLSSAGFCAPHRSQPNLSALSMPNLFGQMPMGTH 
TS PLQ PIX3 P P A VA P S RI RT LPLARS S ARAAET KQG LALR PGD PPLLPPRPPQGLEPTL 
QPSAPQQARDPFEDLLQKTKQDVSPSPALAPAPDSVEQLRKQWETFE 



Further analysis of the NOV50a protein yielded the following properties shown in 
Table 50B. 



Table SOB. Protein Sequence Properties NOVSOa 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1940 probability located in lysosome (lumen); 0.1000 
probability located in mitochondrial matrix space 


SignalP 
analysis: 


Likely cleavage site between residues 15 and 16 



A search of the NOVSOa protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 50C. 



Table 50C. Geneseq Results for NOV50a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV50a 
Residues/ 
Match 
Residues : 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM41501 


Human polypeptide SEQ ID NO 
6432 - Homo sapiens, 545 aa. 
[WO200153312-A1, 26-JUL-2001] 


22..103 
401. .482 I 


82/82 (100%) 
82/82 (100%) 


2e-42 


AAM39715 


Human polypeptide SEQ ID NO \ 
2860 - Homo sapiens, 559 aa. 
[WO200153312-A1, 26-JUL-2001] • 


22.. 103 
396..496 ; 


82/101 (81%) 
82/101 (81%) 


6e-39 


AAW31855 


Mycobacterium tuberculosis 55 kDa ! 
protein - Mycobacterium 
tuberculosis, 572 aa. [W09741252- 
A2, 06-NOV-1997] 


498..845 
71. .389 


96/358 (26%) 
125/358 (34%) 


8e-12 
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AAW31852 


Mycobactenum tuberculosis 74 kDa 
protein - Mycobacterium 
tuberculosis, 763 aa. [W09741252- 
A2, 06-NOV-1997] 


498..845 
262..580 


j 96/358 (26%) 
125/358 (34%) 


8e-12 


AAB50363 


Human SRCAP - Homo sapiens, 
2972 aa. [WO200073467-A1, 07- 
DEC-2000] 


501. .845 
1235..1575 


! 112/369(30%) 
j 141/369(37%) 


le-11 


In a BLAST search of public sequence databases, the NOVSOa protein was found to 
have homology to the proteins shown in the BLASTP data in Table SOD. 


Table SOD. Public BLASTP Results for NOVSOa 


Protein 
Accession 

U 111 tiC 1 


Protein/Organism/Length 


NOVSOa 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9HCG4 


KIAA1608 PROTEIN - Homo 
sapiens (Human), 603 aa 
(fragment). 


309..859 
62..603 


501/555 (90%) 
510/555 (91%) 


0.0 


Q9H796 


CDNA: FLJ21 129 FIS, CLONE 
CAS06266 - Homo sapiens 
(Human), 559 aa. 


22.. 103 
396..496 


81/101 (80%) 
81/101 (80%) 


2e-37 


AAK44515 


HYPOTHETICAL 58.5 KDA 
PROTEIN - Mycobacterium 
tuberculosis CDC 1 5 5 1 , 598 aa. 


499..845 
299..562 


104/354(29%) 
121/354(33%) 


8e-14 


Q9SN46 


EXTENSEN-LIKE PROTEIN - 
Arabidopsis thaliana (Mouse-ear 
cress), 839 aa. 


604..848 
407..626 


73/249 (29%) 
100/249 (39%) 


3e-12 


Q41805 


EXTENS IN-LIKE PROTEIN 
PRECURSOR - Zea mays (Maize), 
1188 aa. 


492..848 
415..749 


88/361 (24%) 
124/361 (33%) 


5e-12 



PFam analysis predicts that the NOV50a protein contains the domains shown in the 
Table 50E. 



Table 50E. Domain Analysis of NOVSOa 


Pfam Domain 


NOVSOa Match Region 




Expect Value 
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Similarities 
for the Matched Region 




No Significant Matches Found 



Example 51. 

The NOV51 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 5 1 A. 



Table 51A. NOV51 Sequence Analysis 




SEQIDNO:155 1394 bp 


NOV51a, 

CG59492-01 DNA Sequence 


GTGGCCTGCTCCTGCAGCAATCCCAGGACCCCCTGCTCATOGGGCTGTTTCCTACTAA 


CCCCAAAGAGAAGACCCAGGAGGAACCCCCTGGCCAGAGCAGGGCCCCTGTGTTGACC 
GTGGTGTCC AAGTTCAAGG CCT CACTGGAG CAGCTTCTGCAGGTC CTACACAGCACCA 
CGCCCCACT ACATTCGCTGCATCAAG CCCAACAGCCAGGG C CAGGCGCAGACCTTTCT 
CCAAGAGGAGGTCCTGAGCCAGCTGGAGGCCTGTGGCCTCGTGGAGACCATCCATATC 
AGTGCTGCTGGCTTCCCCATCCGGGTCTCTCACCGAAACTTTGTAGAACGATACAAGT 
TACT AAGAAGG CTTCATCCTTG CACAT CCT CTGG CCCCGACAG CCCATATCCTGCCAA 
AGGGCTCCCTGAATGGTGTCCACACAGCGAGGAAGCCACGCTTGAACCTCTCATCCAG 
GAC ATTCTCCACACTCTGCCGGTCCT AACTCAGG CAG CAG CC AT AACTGGTGACTCGG 
CTGAGGCCATGCCAGCCCCC7VTGCACTGTG^CAG<3ACCAAGGTGTTCATGACTGACTC 
TATGCTGGAGCTTCTGGAATGTGGGCGTGCCCGGGTGCTGGAGCAGTGTGCCCGCTGC 
ATCCAGGGTGGCTGGAGGCGACACCGGCACCGAGAGCAGGAGCGGCAGTGGCGGGCCG 
TCATGCTCATCCAGGCAGCCATTCGTTCCTGGTTAACTCGGAAACACATCCAGAGGCT 
GCATGCAGCTGCCACAGTCATCAAGCGTGCATGGCAGAAGTGGAGAATCAGAATGGCC 
TGCCTTGCTGCTAAAGAGCTGGATGGTGTGGAAGAAAAACACTTCTCTCAAGCTCCCT 
GTTCCCTGAGCACCTCGCCGCTGCAGACCAGGCTCCTGGAGGCAATAATCCGCTTCTG 
GCCCCTGGG ACTGGTCCTGGCCAAT ACGGCTATGGG TGTAGG CAGCTTTCAGAGG AAA 
TTAGTGGTCTGGGCTTGCCTCCAGCTCCCCAGGGGCAGCCCCAGTAGCTACACTGTCC 
AG ACAG CACAAG ACCAGGCTGGTGTCACGTCCATCCGAGCGCTG CCTCAGGGATCG AT 
AAAGTTTCACTGCAGAAAGTCTCCACTGCGGTATGCTGACATCTGCCCTGAACCTTCA 
CCCT AC AGCATTGCAGG CTTT AAT C AGATTCTGCTGGAAAG AC ACAGGCTG ATCC ACG 
TGACCTCTTCTGCCTTCACTGGGCTGGGGTGATCCTTGGTGCCTTTGTTTCCACAAGG 
CCTTTTCCTGCCCCCTGCCTTGCCAAAGACATTTAATCAGCACACAGCTGCCAGACTA 


TTCCCACAGTGCTCCAAATGCACATGAACAACAGTGACGGCTCCAGCCTTCGACCCAG 


AG 


* 


ORF Start: ATG at 39|ORF Stop: TGA at 1248 




SEQ ED NO: 156 |403 aa MW at 45142.8kD 


NOVSla, 

CG59492-01 Protein Sequence 


MGLFPTNPKEKTQEEPPGQSRAPVLTWSKFKASLEQLLQVLHSTTPHYIRCIKPNSQ 
GQAQTFLQEEVLSQLEACGLVETIHISAAGFPIRVSHRNFVERYKLLRRLHPCTSSGP 
DSPYPAKGLPEWCPHSEEATLEPLIQDILHTLPVLTQAAAITGDSAEAMPAPMHCGRT 
KVFMTDSMLELLECGRARVLEQCARCIQGGWRRHRHREQERQWRAVMLIQAAIRSWLT 
RKHIQRLHAAATVIKRAWQKWRIRMACLAAKELDGVEEKHFSQAPCSLSTSPLQTRLL 
EAI I RF^PLGLVLANTAMGVGSFQRKLVVWACLQLPRGSPSSYTVQTAQDQAGVTS IR 
ALPQGSIKFHCRKSPLRYADICPEPSPYSIAGFNQILLERHRLIHVTSSAFTGIiG 



Further analysis of the NOV5 la protein yielded the following properties shown in 
Table 5 IB. 



Table 51B. Protein Sequence Properties NOVSla 


PSort 
analysis: 


0.3000 probability located in nucleus; 0.2029 probability located in lysosome 
(lumen); 0.1000 probability located in mitochondrial matrix space; 0.0320 
probability located in microbody (peroxisome) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV5 la protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 51C. 



Table 51C. Geneseq Results for NOV51a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#. Datel 


NOV51a 
Residues/ 

lvfstch 
Residues 


Identities/ 
Similarities for 

thp JVIflfrhpfl 

Region 


Expect 

Valiip 


AAY94290 


Human myosin heavy chain 
homoloeue - Homo saniens 612 aa 
[WO200026372-A1, ll-MAY-2000] 


1..403 
210..612 


401/403 (99%) 
401/403 f99%^ 


0.0 


AAU23676 


Novel human enzyme polypeptide 
#762 - Homo sapiens, 387 aa. 
[WO200155301-A2, 02-AUG-2001] 


17..403 
1..387 


384/387 (99%) 
384/387(99%) 


0.0 


ABB 10243 ; 


Human cDNA SEQ ID NO: 55 1 - 

Homo sapiens, 570 aa. 

[WO200 1 54474-A2, 02-AUG-2001 ] 


1..365 
206..570 


365/365(100%) I 
365/365(100%) 


0.0 


AAU23123 j 


Novel human enzyme polypeptide 
#209 - Homo sapiens, 567 aa. 
[WO200155301-A2.02-AUG-2001] : 


1..365 
203..567 


364/365 (99%) 
364/365 (99%) 


0.0 


AAM23563 


Human EST encoded protein SEQ 
ID NO: 1 088 - Homo sapiens, 477 
aa. [WO200154477-A2, 02-AUG- 
2001] 


1..189 
288..476 


188/189 (99%) 
188/189(99%) ; 


e-108 



In a BLAST search of public sequence databases, the NO V5 la protein was found to 
have homology to the proteins shown in the BLASTP data in Table 5 ID. 
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Table 51D. Public BLASTP Results for NOVSla 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOVSla 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96H55 


HYPOTHETICAL 86.7 KDA 
PROTEIN - Homo sapiens (Human), 
770 aa. 


72..403 
439..770 


330/332 (99%) 
330/332 (99%) 


0.0 


Q9D2Z3 


1 1 10055A02RIK PROTEIN (RIKEN 
CDNA 1 1 10055A02 GENE) - Mus 
musculus (Mouse), 395 aa. 


3..394 
2..395 


288/394 (73%) 
320/394 (81%) 


e-162 


Q948A2 


PUTATIVE MYOSIN HEAVY 
CHAIN - Oryza sativa (Rice), 1601 aa. 


2..255 
663..876 


84/258 (32%) 
125/258 (47%) 


le-23 


074805 


HYPOTHETICAL MYOSIN- LIKE 
PROTEIN C2D10.14C IN 
CHROMOSOME II - 
Schizosaccharomyces pombe (Fission 
yeast), 1471 aa. 


20..347 
615..903 


96/340 (28%) 
152/340 (44%) 


le-21 


T30148 


hypothetical protein E02C12.1 - 
Caenorhabditis elegahs, 1019 aa. 


S..249 
619..830 


74/248 (29%) 
119/248(47%) 


6e-21 



PFam analysis predicts that the NOV51a protein contains the domains shown in the 
Table 5 IE. 



Table 51E. Domain Analysis of NOVSla 


Pfam Domain 


NOVSla Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


myosin head: domain 1 of 
1 


26..105 


37/81 (46%) 
60/81 (74%) 


5.1e-25 



Example 52. 

The NOV52 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 5 2 A. 



Table 52A. NOVS2 Sequence Analysis 
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SEQIDNO: 157 


1380 bp 


NOV52a, 

CG59564-01 DNA Sequence 


TAGAATTCCAGCGGCCGCTGAAATCCTCACTCGGTCAGTTCCTCGGGCGAGTTACGGG 


GACG ACCTGCGGGAG CACG CGGG CAGTGG CCGGACGCTG AAGCCCAGG AG AG CG ATGG 


AGACGTATGCGGAGGTTGGGAAGGAGGGCAAGCCTTCCTGTGCATCGGTGGATCTGCA 

GGGAGACAGCTCCTTACAGGTGGAGATTTCTGACGCAGTGAGTGAGCGGGACAAGGTG ■ 

AAATTCACTGTTCAAACAAAGAGCTGCCTCCCTCACTTCGCCCAGACCGAGTTCTCAG 

TCGTGCGGCAGCACGAGGAGTTCATCTGGCTGCATGATGCCTACGTGGAGAATGAGGA 

GTACGCCGGCCTCATCATCCCCCCAGCCCCTCCGAGGCCAGACTTTGAGGCTTCGAGG 

GAAAAGCTACAGAAATTGGGCGAGGGGGACAGCTCTGTCACTCGGGAAGAGTTTGCCA 

AG ATGAAGC AGGAGCTGG AAG CGGAGT ACCTGGC C ATCTTTAAGAAGACAGTTGCGAT 

GCACGAAGTCTTTCTCCAGCGCCTGGCGGCCCACCCCACCCTGCGTCGAGACCACAAC 

TTCTTTGTGTTTTTGGAATATGGACAGGATCTGAGTGTCCGGGGGAAGAACAGGAAGG 

AGCTCCTCGGAGGGTTTCTGAGGAATATTGTGAAGTCCGCGGATGAAGCCCTCATCAC 

GGGCATGTCAGGGCT CAAGG AGGTGGATGACTTCTTTGAGCATGAGAGG ACCT TCCTG 

TTGGAGTATCACACCCGTATCCX3AGATGCCTGCCTGCGGGCCGACCGCGTCATGCGCG 

CCCACAAGTGCCTGGCAGACGATTATATCCCTATCTCAGCTGCGCTGAGCAGTCTGGG 

AACACAGGAAGTCAACCAGCTAAGGACGAGCTTCCTCAAATTGGCAGAGCTCTTTGAC 

CGGCTGAGGAAGCTGGAGGGCCGGGTGGCTTCCGATGAGGACCTGAAGCTGTCAGACA 

TG CTGAGGT ACTAC ATGCGTGACTC ACAGGCAG CCAAGG ACCTGCTGTACCGGCGGCT 

GCGGG CACTGG CCG ACTACG AGAATGCCAAC AAGGCGCTGG ACAAGGCGCGCACCAGG 

AACCGGGAGGTGCGGCCCGCCGAGAGCCACCAGCAGCTGTGCTGCCAACGCTTCX3AGC 

GCCTCTCCGACTCCGCCAAGCAAGAGCTCATGGACTTCAAGTCCCGCCGGGTCTCCTC 

TTTTCGAAAGAATCTCATTGAGCTGGCAGAGCTGGAGCTCAAACACGCCAAGGCCAGC 

ACCCTGATTCTCCGGAACACCCTTGTTGCCCTAAAGGGGGAGCCTTAGAGTAGCCAGA 

GCTCAGCCAGACCCTAATCTGGGATCTCCAGTGACCAGGGTATCCC 




ORF Start: ATGat 113 


ORF Stop: TAG at 1322 




SEQIDNO: 158 


403 aa 


MW at 46384.2kD 


NOV52a, 

CG59564-01 Protein Sequence 


METYAEVGKEGKPS CASVDLQGDSSLQVE I SDAVSERDKVKFTVQTKSCLPHFAQTEF 
S WRQHEEFIWLHDAYVENEE YAGLI I PPAP PRPDFEAS REKLQKLGEGDS SVTREE F 
AKMKQELEAEYLAIFKKTVAMHEVFLQRLAAHPTLRRDHNFFVFLEYGQDLSVRGKNR 
KELLGGFLRNIVKSADEALITGMSGLKEVDDFFEHERTFLLEYHTRIRDACLRADRVM 
RAH KCLADD YI P I S AALS S LGTQEVNQLRTS FLKLAELFDRLRKLEGRVASDEDLKLS 
DMLRYYMRDSQAAKDLLYRRLRALADYENANKALDKARTRNREVRPAESHQQLCCQRF 
ERLSDSAKQELMDFKSRRVSSFRKNLIELAELELKHAKASTLILRNTLVALKGEP 



Further analysis of the NOV52a protein yielded the following properties shown in 
Table 52B. 



Table 52B. Protein Sequence Properties NOV52a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.1000 probability located in 
mitochondrial matrix space; 0.1000 probability located in lysosome (lumen); 
0.0000 probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV52a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 52C. 
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Table 52C. Geneseq Results for NOV52a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, ; 
Date] 


NOV52a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAY94209 


Human TRAF four associated factor 
TFAF2 - Homo sapiens, 406 aa. 
[CA2245340-A1, 19-FEB-2000] 


17..402 
23..405 


273/386 (70%) 
333/386 (85%) 


e-160 


AAB07856 


Amino acid sequence of Smad 1 
interactor protein clone S 1 + 1 2-2 - 
Homo sapiens, 414 aa. 
[WO200047102-A2, 17-AUG-2000] ; 


17.. 402 
31..413 


273/386 (70%) 
333/386 (85%) 


e-160 


AAB43157 


Human ORFX ORF2921 polypeptide 
sequence SEQ ID NO:5 842 - Homo 
sapiens, 460 aa. [WO200058473-A2, 
05-OCT-2000] 


17..402 
77..459 


273/386(70%) j 
333/386(85%) j 


e-160 


AAB58368 


Lung cancer associated polypeptide 
sequence SEQ ID 706 - Homo sapiens, 
414 aa. [WO200055180-A2, 21-SEP- \ 
2000] 


17..402 
31..413 


273/386(70%) \ 
333/386(85%) j 


e-160 


AAO13507 


Human polypeptide SEQ ID NO 
27399 - Homo sapiens, 443 aa. 
[WO200164835-A2, 07-SEP-2001] 


17..400 
61..441 


242/384(63%) ! 
317/384(82%) j 


e-144 



In a BLAST search of public sequence databases, the NOV52a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 52D. 



Table 52D. Public BLASTP Results for NOV52a 


Protein 
Accession 
Number 1 


Protein/Organism/Length 


NOV52a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9UNH7 


Sorting nexin 6 (TRAF4-associated 
factor 2) - Homo sapiens (Human), 406 
aa. 


17..402 
23..405 


273/386 (70%) 
333/386(85%) 


e-159 


Q9CZ03 


2810425K19RDC PROTEIN - Mus 
musculus (Mouse), 406 aa. 


17..402 
23..405 


271/386 (70%) 
333/386 (86%) 


e-159 


Q9Y5X3 


Sorting nexin 5 - Homo sapiens 
(Human), 404 aa. 


17..400 
22..402 


242/384 (63%) 
317/384 (82%) 


e-143 
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Q9D8U8 


Sorting nexin 5 - Mus musculus 
(Mouse), 404 aa. 


17..400 
22..402 


241/384(62%) 
314/384(81%) 


e-142 


Q96NG4 


CDNA FLJ30934 FIS, CLONE 
FEBRA200701 7, MODERATELY 
SIMILAR TO HOMO SAPIENS 
TRAF4- ASSOCIATED FACTOR 2 
MRNA - Homo sapiens (Human), 277 
aa. 


1..237 
1..237 


236/237 (99%) 
236/237(99%) 


e-134 



PFam analysis predicts that the NOV52a protein contains the domains shown in the 
Table 52E. 



Table 52E. Domain Analysis of NO\ 52a 


Pfam Domain 


NOV52a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


PX: domain 1 of 1 


23... 164 


39/160 (24%) 
103/160(64%) 


1.6e-15 



Example 53. 

The NOV53 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 53A. 



Table 53A. NOVS3 Sequence Analysis 




SEQIDNO: 159 


3056 bp 


NOV53a, 

CG59553-01 DNA Sequence 


CTCCTG CGGGGTCAAAT A CAG AATTT ACG C AC CCTTGG CTT CCTTGG AG C CT AG CGG C 


TCTCCCCGCGTCCAAGATGGCGGCAGAAGCAGCTGGTGGGAAATACAGAAGCACAGTC 
AGCAAAAGCAAAGACCCCTCGGGGCTGCTCATCTCTGTGATCAGGACTCTGTCTACTA 
GTGACGATGTCGAAGACAGGGAAAATGAAAAGGGTCGCCTTGAAGAAGCCTACGAGAA 
ATGTGACCGTGACCTGGATGAATTGATTGTACAGCACTACACAGAATTGACGACAGCC 
ATTCG CAC AT AC CAG AGCAT CACAG AG CGCAT CA CT AACT CC CG AAATAAAAT AAAG C 
AGGTAAAAGAGAACCTGC1TTCATGCAAGATGCTGCTGCACTGCAAACGGGATGAGCT 
TCGGAAACTGTGGATTGAAGGAATTGAGCATAAGCATGTCCTGAACTTGTTGGATGAA 
ATTGAG AAT ATCAAGCAAGTGCCTCAAAAG CTGG AACAGTGCATGG CC AG CAAGCACT 
ATCTCAGTGCCACTGACATGTTGGTGTCAGCAGTTGAGTCTTTGGAGGGCCCCCTGCT 
C CAGGTGG AAGG ACTG AGTG AC CTTCG ACT AG AG CTTC A CAG CAAG AAG ATG AAC C TT 
CACTTGGTTCTCATAGATGAACTACACCGGCACCTGTACATCAAATCGACTAGCCGAG 
TTGTGCAG CGT AACAAGGAAAAAGGGAAAATCAGCTCCCTCGTGAAAG ATG CTTCTGT 
TCCTCTGATTGATGTTACAAACCTCCCTACTCCTCGAAAATTCCTTGATACCTCTCAC 
TATTCTACTGCTGGAAGCTCAAGTGTGAGGGAGATAAATCTGCAGGACATCAAGGAAG 
ATTT AG AATTGG ATC CAG AGGAAAAC AG CACC CTGTTT ATGGG T AT C CTCATT AAGGG 
CTTGGCGAAACTGAAGAAGATCCCAGAAACAGTTAAGGCAATCATAGAGCGCTTGGAG 
CAGGAGTTGAAGCAAATTGTGAAGAGGTCTACAACCCAGGTGGCAGACAGTGGCTATC 
AGCGGGGGGAGAACGTTACTCTGGAGAACCAACCAAGGTTGCTTCTAGAACTGCTGGA 
GTTACTGTTTGACAAGT^TAATGCTGTAGCCGCTGCACACTCTGTGGTCCTGGGATAC 
CTGCAGGACACTGTAGTGACTCCACTGACTCAGCAGGAAGATATCAAACTGTATGATA 
TGGCAGATGTATGGGTGAAGATCCAAGATGTTCTACAGATGCTATTAACTGAGTACTT 
GG ATATGAAAAATACTCGT ACGGC CTCTGAACCAT CAG CTCAACT AAG CT ATGCCAGC 
ACTGGACGAGAGTTTGCAGCCTTTT^TGCCAAGAAGAAACCTCAAAGGCCAAAAAATT 
CTCTTTTCAAGTTCGAATCGTCCTCCC^TGCCATCAGTATGAGCGCCTATCTGCGAGA 
ACAGAGAAGGGAGCTCTATAGTCX3GAGTGGAGAACTGCAAGGGGGTCCTGATGACAAC 
TT AATTG AAGGTGG AGG AAC AAAATTTGT CTG CAAA CCTGG AGCCAG AAA CATT AC CG 
TCATATTCCACCCATTACTAAGATTTATTCAGGAGATTGAGCATGCTCTGGGTCTTGG 
CCCAGCCAAACAGTGTCCTCTTCGAGAGTTTCTCACCGTGTACATCAAAAACATCTTT 
CTCAAT CAAG T CTTGG CTG AG AT C AA CAAGG AG ATTG AAGG AG T CACT AAAACATCTG 
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ACCCTTTGAAGATTCTGGCCAACGCAGACACCATGAAGGTGCTGGGAGTGCAGCGGCC 
TCTCCTACAG AGCACAATCATTGTGGAGAAGACAGTT CAAGAC CTCCTG AACCTG ATG 
CATGACTTGAGTGCATATTCAGATCAATTCCTCAACATGGTGTGCGTGAAGCTCCAGG : 
AGT ACAAGG A CAC CTG C ACTG C AGC TT A C AGGGGT ATTG T C C AGT CAG AAG AAAAACT 
TG T CATCAGTGCATCCTGGGCAAAAGATGATGATATC AG CAG ACT CTTGAAAT CTCT A 
C CAAACTGG ATG AAT ATGG CT CAAC CCAAACAG CTG AGG C C AAAAAG AG AGG AGG AAG 
AAG ATTTCATAAGGGCAGCTTTTGG CAAGGAGTCTGAAGTTCTTATTGGGAAC CTGGG 
TGAT AAATTAATCCCTCCACAAG ACATCCTTCGTGACGTCAGTGACCT CAAAG CCTTG 
GCCAACATGCATGAAAGCCTGGAATGGTTGGCAAGTCGAACAAAGTCAGCTTTCTCCA 
ATCTTTCTACATCCCAGATGCTTTCTCCTGCTCAAGACAGCCACACGAACACGGATCT 
CCCCCOVGTGTCAGAGCAGATCATGCAGACTCrC^GTGAACTTGCCAAATCGTTCCAG 
GATATGGCTGACCGCTGCTTGCrTGTCTTACATCTGGAAGTGAGGGTTCACTGTTTCC 
ACTATCTTAT CCCTCTTGCAAAGGAGGGG AACT ATGCCATTGTGGCTAATG TGGAAAG 
TATGGATTATGACCCCCTGGTGGTCAAGCTCAACAAAGATATCAGCGCCATTGAAGAG 
G C C ATG AG CG C C AGC CTTC AG CAG CACAAGTT C CAG T AT AT CTTCG AAGG C CTGGG CC 
ACCTGATCTCCTGCATCCTCATTAATGGTGCCCAGTACTTCAGGCGCATCAGTGAGTC 
TGGCATCAAGAAAATGTGTAGGAACATTTTTGTTCTTCAGCAGAATTTGACCAACATC 
ACCATGTCXSCGGGAGGCAGACCTGGACTTTGCAAGGCAGTACTACGAGATGCTTTACA 
ACACAGCTGACGAGCTC CTGAACCTGGTGGTGGAC CAGGGTGTG AAGTACACGG AGCT 
GGAGTACATCCACGCTCTGACCCTGCTGCACCGCAGCCAGACTGGGGTGGGGGAACTG 
ACC ACCCAGAACACGAG CTGCAGAGG AGGCTCAAAGAG ATCATCTGCG AG CAGGCTGC 
CATCAAGCAAGCCACCAAGGACAAGAAGATAACTACCGTTTAGCAGGGCGTACTGCGG 
TTGGTGACGGGGGTCCCCTCAGTCACACTCACTTTTTTCC 




ORF Start: ATG at 75]ORF Sto] 


p: TAA at 2988 




SEQIDNO: 160 |971 aa 


MW at 109984.9kD 


NOV53a, 

CG59553-01 Protein Sequence 


MAAEAAGGKYRSTVSKSKD PSGLLI S VI RTLSTSDDVED RENE KGRLEE AYE KCDRDL 
DELI VQHYTELTTAI RTYQS I TERI TNSRNKI KQVKENLLSCKMLLHCKRDELRKLWI 
EGIEHKHVLNLLDEIENIKQVPQKLEQCMASKHYLSATDMLVSAVESLEGPLLQVEGL 
SDLRLELHSKKMNIJILVLIDELHRHLYIKSTSRWQRNKEKGKISSLVKDASVPLIDV 
TNLPTPRKFLDTSHYSTAGSSSVREINLQDIKEDLELDPEENSTLFMGILIKGLAKLK 
K I PETVKA 1 1 ERLEQELKQ I VKRSTTQ VADSG YQRGENVTVENQ PRLLLE L L ELL FDK 
FNAVAAAHS WLG YLQDTWTPLTQQED I KLYDMADVWVKI QD VLQMLLTE YLDMKNT 
RTASEPSAQLSYASTGREFAAFFAKKKPQRPKNSLFKFESSSHAISMSAYLREQRREL 
YSRSGELQGGPDDNLIEGGGTKFVCKPGARNITVIFHPLLRFIQEIEHALGLGPAKQC 
PLREFLTWIKNIFIiNQVLAEINKEIEGVTKTSDPLKILANADTMKVLGVQRPLLQST 
1 1 VEKTVQDLIiNIiMHDLSAYSDQFLNMVCVKLQEYKDTCTAAYRG IVQSEEKLVISAS 
WAKDDDISRLLKSLPNWMNMAQPKQLRPKREEEEDFIRAAFGKESEVLIGNLGDKLIP 
PQD I LRD VSDLKALANMHE S LEWLAS RTKS AFSNL STS QMLS P AQD S HTNTDL P PVSE 
QIMQTLSELAKSFQDMADRCLLVLHLEVRVHCFHYLIPLAKEGNYAIVANVESMDYDP 
LWKLNKDISAIEEAMSASLQQHKFQYIFEGLGHLISCILINGAQYFRRISESGIKKM 
CRN I F VLQQNLTN I TMS READ LD F ARQ YYEMLYNTAD ELLN L WDQG VKYTELE Y I HA 
LTLLHRSQTGVGELTTQNTSCRGGSKRSSASRLPSSKPPRTRR 



Further analysis of the NOV53a protein yielded the following properties shown in 
Table 53B. 



Table 53B. Protein Sequence Properties NOV53a 


PSort 
analysis: 


0.5500 probability located in endoplasmic reticulum (membrane); 0.1900 
probability located in lysosome (lumen); 0.1000 probability located in 
endoplasmic reticulum (lumen); 0.1000 probability located in outside 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV53a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 53C. 
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Table 53C. Geneseq Results for NOV53a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV53a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB93175 


Human protein sequence SEQ ID 
NO: 121 14 - Homo sapiens, 974 aa. 
[EP1074617-A2, 07-FEB-2001] 


1..947 
1..947 


947/947 (100%) 
947/947 (100%) 


0.0 


AAW69801 


Amino acid sequence of rsec8, a 
protein present in SA-17S complex - 
Rattus sp, 975 aa. [W09828419-A2, i 
02-JUL-1998] 


1..947 
1..948 


902/948(95%) 
925/948(97%) 


0.0 


AAB95143 


Human protein sequence SEQ ID 
NO:17163 - Homo sapiens, 572 aa. 
[EP 1 0746 1 7-A2, 07-FEB-200 1 ] 


403..947 
1..545 


545/545(100%) 
545/545(100%) 


0.0 


AAB58175 


Lung cancer associated polypeptide 
sequence SEQ ID 513 - Homo 
sapiens, 41 o aa. [WO200055180-A2, ; 
21-SEP-2000] 


571..947 
15..391 


369/377(97%) 
369/377 (97%) 


0.0 


AAG00950 


Human secreted protein, SEQ ID 
NO: 5031 - Homo sapiens, 100 aa. 
[EP1033401-A2, 06-SEP-2000] 


451. .544 
7..100 


76/94 (80%) 
79/94 (83%) ; 


3e-36 



In a BLAST search of public sequence databases, the NOV53a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 53D. 



Table 53D. Public BLASTP Results for NOV53a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV53a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96A65 


CDNA FLJ14782 FIS, CLONE 
NT2RP4000524, HIGHLY SIMILAR 
TO MUS MUSCULUS SEC8 MRNA 
(SECRETORY PROTEIN SEC8) - 
Homo sapiens (Human), 974 aa. 


1..947 
1..947 


947/947 (100%) 
947/947(100%) i 


0.0 


Q9C0G4 


KIAA1699 PROTEIN - Homo sapiens 
(Human), 966 aa (fragment). 


9..947 ; 
1..939 


939/939(100%) ; 
939/939(100%) 


0.0 


035382 


SEC8 - Mus musculus (Mouse), 971 
aa. 


1..971 
1..971 


923/972(94%) 
946/972(96%) 


0.0 


Q62824 


RSEC8 - Rattus norvegicus (Rat), 975 
aa (fragment). 


1..947 
1..948 


902/948(95%) 
925/948 (97%) 


0.0 
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Q9P102 


REC8 - Homo sapiens (Human), 637 aa 


339..947 


609/609(100%) 


. 0.0 




(fragment). 


2..610 


609/609(100%) \ 





PFain analysis predicts that the NOV53a protein contains the domains shown in the 
Table 53E. 



Table 53E. Domain Analysis of NOV53a 






Identities/ 


Pfam Domain 


NOV53a Match Region 


Similarities Expect Value 






for the Matched Region 


No Significant Matches Found 



Example 54. 

The NOV54 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 54A. 



Table 54A. NOV54 Sequence Analysis 




SEQIDNO: 161 


501 bp 


NOV54a, 

CG59545-01 DNA Sequence 


CAACACGAGGAACAATOTCTTCTTTACCCGTGCCATACAAACTGCCTGTGTCTTTGTC 
TGTTGGTTCCTGCGTGATAATCAAAGGGACACTGATCGACTCTTCTATCAACGAACCA 
CAG CTGCAGGTGG ATTTCTACACTG AGATGAATG AGG ACTCAG AAATTG CCTTCCATT 
TGCGAGTGCACTTAGGCCGTCGTGTGGTCATGAACAGTCGTGAGTTTGGGATATGGAT 
GTTGGAGGAGAATTTACACTATGTGCCCTTTGAGGATGGCAAACCATTTGACTTGCGC 
ATCTACGTGTGTCTCAATGAGTATGAGGTAAAGGTAAATGGTGAATACATTTATGCCT 
TTG T CCATCGAATCCCGCCATCATATGTGAAG ATGATTCAAGTGTGG AGAGATGTCTC 
CCTGGACTCAGTGCTTGTCAACAATGGACGGAGATGATCACACTCCTCATTGTTGAGG 
AAACCCTCTTTCTACCTGACCATGGGATTCCTAGAGC 




ORF Start: ATG at 15 


ORF Stop: TGA at 441 




SEQIDNO: 162 


142 aa MW at 16511.9kD 


NOV54a, 

CG59545-01 Protein Sequence 


MSSLPVPYKLPVSLSVGSCVIIKGTLIDSSINEPQLQVDFYTEMNEDSEIAFHLRVHL 
GRRVVMNSREFGIWMLEENLHYVPFEDGKPFDLRIYVCLNEYEVKVNGEYIYAFVHRI 
P PS YVKM I Q VWRD VS LDS VL VNNGRR 



Further analysis of the NOV54a protein yielded the following properties shown in 
Table 54B. 



Table 54B. Protein Sequence Properties NOV54a 


PSort 
analysis: 


0.5500 probability located in endoplasmic reticulum (membrane); 0.1900 
probability located in lysosome (lumen); 0.1000 probability located in 
endoplasmic reticulum (lumen); 0.1000 probability located in outside 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV54a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 54C. 



Table 54C. Geneseq Results for NOV54a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV54a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG66741 


Human Charcot-Leyden crystal 
protein 5A (CLC5A) - Homo sapiens, 
142 aa. [CN1302875-A, 11-JUL- 
20011 

£m \J\J 1 J 


1..142 
1..142 


139/142(97%) 
139/142(97%) 


2e-77 


AAG66742 


Human Charcot-Leyden crystal 
protein 5B (CLC5B) - Homo sapiens, 
170 aa. [CN1302875-A, 11-JUL- 
2001] 


6.. 142 
34.. 170 


136/137(99%) 
136/137 (99%) 


3e-76 


AAM79041 


Human protein SEQ ID NO 1703 - 
Homo sapiens, 139 aa. 
[WO200157190-A2, 09-AUG-2001] 


1..139 
1..139 


107/139 (76%) 
116/139 (82%) 


2e-56 


AAY28350 


Full Placental Protein 1 3 amino acid 
sequence - Homo sapiens, 139 aa. 
[WO9938970-A1, 05-AUG-1999] 


1..139 
1..139 


107/139 (76%) 
116/139 (82%) 


2e-56 
t 


AAG78627 


Human Charcot-Leyden crystal 4 
CLC4 protein #2 - Homo sapiens, 
167 aa. [CN1302876-A, 11-JUL- 
2001] 


6..139 
34.. 167 


102/134 (76%) 
111/134 (82%) 


2e-53 



In a BLAST search of public sequence databases, the NOV54a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 54D. 



Table 54D. Public BLASTP Results for NOV54a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV54a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9UHV8 


PLACENTAL PROTEIN 1 3 
(PLACENTA PROTEIN 1 3) - Homo 
sapiens (Human), 139 aa. 


1..139 
1..139 


107/139 (76%) 
116/139 (82%) 


9e-56 


Q9NR03 








9e-45 
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PROTEIN - Homo sapiens (Human), 
139 aa. 


1..139 


107/139(76%) 




A46523 


Charcot-Leyden crystal protein - 
human, 142 aa. 


1..142 
1..142 


76/142 (53%) 
96/142 (67%) 


7e-36 


Q05315 


Eosinophil lysophospholipase (EC j 
3.1.1.5) (Charcot-Leyden crystal 
protein) (Lysolecithin acylhydrolase) 
(CLC) (Galactin- 10) - Homo sapiens 
(Human), 141 aa. 


2..142 
1..141 


75/141 (53%) 
95/141 (67%) 


3e-35 


Q96KD6 


PLACENTAL PROTEIN 13 -LIKE - 
Homo sapiens (Human), 104 aa \ 
(fragment). 


1..104 ; 
1..104 


66/104(63%) 
79/104(75%) 


le-31 



PFam analysis predicts that the NOV54a protein contains the domains shown in the 
Table 54E. 



Table 54E. Domain Analysis of NOV54a 


Pfam Domain 


NOV54a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


Gal-bind lectin: domain 1 
of 1 


5..137 


37/142(26%) 
106/142(75%) 


3.1e-28 



Example 55. 



The NOV55 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 55A. 



Table 55A. NOV55 Sequence Analysis 




SEQIDNO: 163 2071 bp 


NOV55a, 

CG59435-01 DNA Sequence 


AAACTATTTGTAGGCGCAGTCATGCAGGAAAACCTCAGATTTGCTTCATCAGGAGATG 

ATATTAAAATATGGGATGCTTCATCTATGACATTGGTGGATAAATTCAACCCACACAC 

ATCAC CACATGGAATCAG CTC AAT ATGTTGGAG C AGCAATAGT AACTTTTTAGTAAC A 

GCATCTTCCAGTGGCGACAAAATAGTTGTCTCAAGTTGCAAATGTAAACCTGTTCCAC 

TTTT AG AG CTTG CTG AAGGG CAAAAGC AG A CATG TGTCAATTT AAATTCT ACAT CT AT 

GTATTTX5GTAAGCGGAGGCCTAAATAACACTGTTAATATTTGGGATTTAAAATCAAAA 

AGAGTTCATCGATCTCTTAAGGATCATAAAGATCAAGTAACTTGTGTAACATACAATT 

GGAATGATTGCTACATTGCTTCTGGATCTCTTAGTGGTGAAATTATTTTACACAGTGT 

AACCACTAATTT ATCTAGT ACTCCTTTTGG CCATGGT AGT AAC CAGGTTCGGCACTTG 

AAGTACTCCTTGTT^AAGAAATCACTACTGGGCAGTGTTTCGGATAATGGAATAGTAA 

CTCTCTGGGATGTAAATAGTCAGAGTCCATACCATAACTTTGACAGTGTACACAAAGC 

T CCAG CGTCAGGCATCTGTTTTTCTC CTGTCAATG AATTX3CTCTTTGTAACCATAGGC 

TTGGATAAAAGAATCATCCTCTATGACACTTCAAGTAAGAAGCTAGTGAAAACTTTAG 

TGG CTGACACTCCTCT AACTG CGGT AGATTTCATG CCTGATGG AG CCACTTTGGCTAT 

TGGATCTTCCCGGGGGAAAATATATCAATATGATTTAAGAATGTTGAAATCACCAGTT 

AAG AC CAT CAGTG CT C AC AAG ACATCTG TG CAG TGT ATAG CATTTCAGT ACTC CAC TG 

TTCT^ACTAAGTCJ^GTTTAAATAAAGGCTGTTCAAATAAGCCCACAACAGTGAACAA 

ACGAAGTGTTAATGTGAATGCTGCTAGTGGAGGAGTTCAGAATTCCGGAATTGTCAGA 

GAAGCACCTGCCACGTCCATTGCCACAGTTCTACCACAACCTATGACATCAGCTATGG 

GGAAAGGAAGAGTTGCTGTTCAAGAAAAAGCAGGT^TGCCTCC 

C ACTTT AT CT AAGG AAAC AG ACAGTGG AAAAAAT C AGG ATTTCT C CAG CTTTG ATG AT 
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ACTGGGAAAAGTAGTTTAGGTGACATGTTCTCACCTATCAGAGATGATGCTGTAGTTA 
ACAAGGGAAGTGATGAGTCCATAGGCAAAGGAGATGGCTTTGACTTTCTACCGCAGTT 
GAACTCAGTGTTTCCTCCAAGAAAAAATCCAGTAACTTCAAGTACTTCAGTATTGCAT 
TCTAGTCCTCTTAATGTTTTTATGGGATCTCCAGGGAAAGAGGAAAATGAAAACCGTG 
ATCTAACAGCTGAGTCTAAGAAAATATATATGGGAAAACAGGAATCTAAAGACTCCTT 
CAAACAGTTAGCAAAGTTGGTCACATCTGGTGCTGAAAGTGGAAATCTAAATACCTCT 
CCATCATCTAACCAAACAAGAAATTCTGAGAAATTTGAAAAGCCAGAGAATGAAATTG 
AAGCCCAGTTGATATGTGAACCCCCAATCAATGGATCCTCAACTCCAAATCCAAAGAT 
AGCATCTTCTGTCACTGCTGGAGTTGCCAGTTCACTCTCAGAAAAAATAGCCGACAGC 
ATTGGAAATAACCGGCAAAATGCACCATTGACTTCCATTCAAATTCGTTTTATTCAGA 
ACATGATACAGGAAACGTTGGATGACTTTAGAGAAGCATGCCATAGGGACATTGTGAA 
TTTGCAAGTGGAGATGATTAAACAGTTTCATATGCAACTGAATGAAATGCATTCTTTG 
CnWAAAGATACTCAGTGAATGAAGGTTTAGTGGCTGAAATTGAAAGACTACGAGAAG 
AAAACAAAAGATTACGGGCCCACTTTTGAAATTTCAGTGAATACCTTAATGTTCTGTA 
ATTTGGGAAGTTTCTGGCAACACAGAACTACATAGAATCAT 




ORF Start: ATG at 22 


ORF Stop: TGA at 1999 




SEQIDNO: 164 


659 aa 


MW at71851.2kD 


NOV55a, 

CG59435-01 Protein Sequence 


MQENLRFASSGDDIKIWDASSMTLVDKFNPHTSPHG I SS I CWSSNSNFLVTASSSGDK 
IWSSCKCKPVPLLELATCQKQTC^nJLNSTSMYLVSGGLNNTVNIWDLKSKRVHRSLK 
DHKDQVTCVTYNWNDCYIASGSLSGEIILHSVTTNLSSTPFGHGSNQVRHLKYSLFKK 
SLLGSVSDNGIVTLWDVNSQSPYHNFDSVHKAPASGICFSPVNELLFVTIGLDKRIIL 
YDTS S KKLVKTLVADTPLTAVDFM PDGATLAIGS SRGKI YQYDLRMLKS PVKT IS AHK 
TS VQC I AFQYSTVLTKSS LNKGCSNKPTTVNKRS VNVNAASGG VQNSGI VREAPATS I 
ATVLPQPMTSAMGKGTVAVQEKAGLPRSINTDTLSKETDSGKNQDFSSFDDTGKSSLG 
DMFSPIRDDAWNKGSDESIGKGDGFDFLPQLNSVFPPRKNPVTSSTSVLHSSPLNVF 
MGSPGKEENENRDLTAESKKIYMGKQESKDSFKQLAKLVTSGAESGNLNTSPSSNQTR 
NSEKFEKPENEIEAQLICEPPINGSSTPNPKIASSVTAGVASSLSEKIADSIGNNRQN 
APLTSIQIRFIQNMIQETLDDFREACHRDIVNLQVEMIKQFHMQLNEMHSLLERYSVN 
EGLVAEI ERLRE ENKRLRAHF 




SEQIDNO: 165 


2009 bp 


NOV55b, 

CG59435-02 DNA Sequence 


AAACTATTTGTAGGCGCAGTCATGCAGGAAAACCTCAGATTTGCTTCATCAGGAGATG 
ATATTAAAATATGGGATGCTTCATCTATGACATTGGTGGATAAATTCAACCCACACAC 
ATCACCACATGGAATCAGCTCAATATGTTGGAGCAGCAATAATAACTTTTTAGTAACA 
GCATCTTCCAGTGGCGACAAAATAGTTGTCTCAAGTTGCAAATGTAAACCTGTTCCAC 
TTTTAGAGCTTGCTGAAGGGCAAAAGCAGACATGTGTCAATTTAAATTCTACATCTAT 
GTATTTGGTAAGCGGAGGCCTAAATAACACTGTTAATATTTGGGATTTAAAATCAAAA 
AGAGTTCATCGATCTCTTAAGGATCATAAAGATCAAGTAACTTGTGTAACATACAATT 
GGAATGATTGCTACATTGCTTCTGGATCTCTTAGTGGTGAAATTATTTTACACAGTGT 
AACCACTAATTTATCTAGTACTCCTTTTGGCCATGGTAGTAACCAGTCTGTTCGGCAC 
TTGAAGTACTCCTTGTTTAAGAAATCACTACTGGGCAGTGTTTCGGATAATGGAATAG 
TAACTCTCTGGGATGTAAATAGTCAGAGTCCATACCATAACTTTGACAGTGTACACAA 
AGCTCCAGCGTCAGGCATCTGTTTTTCTCCTGTCAATGAATTGCTCTTTGTAACCATA 
GGCTTGGATAAAAGAATCATCCTCTATGACACTTCAAGTAAGAAGCTAGTGAAAACTT 
TAGTGGCTGACACTCCTCTAACTGCGGTAGATTTCATGCCTGATGGAGCCACTTTGGC 
TATTGGATCTTCCCGGGGGAAAATATATCAATATGATTTAAGAATGTTGAAATCACCA 
GTTAAGACCATCAGTGCTCACAAGACATCTGTGCAGTGTATAGCATTTCAGTACTCCA 
CTGTT CTTACTAAGTCAAGTTTAAATAAAGGCTGTTCAAATAAG CCCACAACAGTG AA 
CAAACGAAGTGTTAATGTGAATGCTGCTAGTGGAGGAGTTCAGAATTCCGGAATTGTC 
AGAGAAGCACCTGCCACGTCCATTGCCACAGTTCTACCACAACCTATGACATCAGCTA 
TG GGG AAAGG AACAGTTGCTG TTCAAG AAAAAG CAGGTTTG CCT C GAAG CAT AAAC AC 
AG ACACTTT AT CT AAGG AAA(^G ACAGTGG AAAAAATCAGG ATTT CT C C AG CTTTG AT 
GATACTGGGAAAAGTAGTTTAGGTGACATGTTCTCACCTATCAGAGATGATGCTGTAG 
TTAACAAGGGAAGTGATGAGTCCATAGGCAAAGGAGATGGCTTTGACTTTCTACCGCA 
GTTGAACTCAGTGTTrCCTCCAAGAAAAAATCCAGTAACTTCAAGTACTTCAGTATTG 
CATTCTAGTCCTCTTAATGTTTTTATGGGATCTCCAGGGAAAGAGGAAAATGAAAACC 
GTGATCTAACAGCTGAGTCTAAGAAAATATATATGGGAAAACAGGAATCTAAAGACTC 
CTTauVACAGTTAGCAAAGTTGGTCACATCTGGTGCTGAAAGTGGAAATCTAAATACC 
TCTC C AT C ATCT AAC C AAACAAGAAATT CTGAG AAATTTG AAAAG CCAG AG AATG AAA 
TTG AAG CCC AGTTGATATGTG AACCCCCAATCAATGG AT CCTCAACTCCAAATC C AAA 
GATAGCATCTTCTGTCACTGCTGGAGTTGCCAGTTCACTCTCAGAAAAAATAGCCGAC 
AG CATTGG AAATAACCGGCAAAATG CACCATTGACTTCCATTCAAATTCGTTTT ATTC 
AGAACATGATACAGGAAACGTTGGATGACTTTAGAGAAGCATGCCATAGGGACATTGT 
GAATTTGCAAGTGGAGATGATTAAACAGTTTCATATGCAACTGAATGAAATGCATTCT 
TTGCTGGAAAGATACTCAGTGAATGAAGGTTTAGTGGCTGAAATTGAAAGACTACGAG 
AAG AAAACAAAAG ATTACGGG C CCACTTTTGAAATTT 




ORF Start: ATG at 22 


ORF Stop: TGA at 2002 




SEQ ID NO: 166 


660 aa 


MWat71965.3kD 


NOV55b, 

CG59435-02 Protein Sequence 


MQENLRFASSGDDIKIWDASSMTLVDKFNPHTSPHGISSICWSSNNNFLVTASSSGDK 
IVVSSCKCKPVPLLELAEGQKQTCVNLNSTSMYLVSGGLNNTVNIWDLKSKRVHRSLK 
DHKDQVTCVI^NW»n)CYIASGSLSGEIILHSVTTNLSSTPFGHGSNQSVRHLKYSLFK 
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KSLLGSVSDNGIVTLWDVNSQSPYHNFDSVHKAPASGICFSPVNELLFVTIGLDKRII 
LYDTS S KKL VKTLVADTP LT AVDFM PDG AT LA IG S SRG K I YQ YD LRML KS PVKT I S AH 
KTS VQC I AFQY S TVLTKS S LNKGCSNK PTTVNKRS VNVNAASGG VQNSG I VREA PATS 
IATVLPQPMTSAMGKGTVAVQEKAGLPRSINTDTLSKETDSGKNQDFSSFDDTGKSSL 
GDMFSPIRDDAWNKGSDESIGKGDGFDFLPQLNSVFPPRKNPVTSSTSVLHSSPLNV 
FMGS PGKEENENRDLTAESKKI YMGKQE SKDSFKQLAKLVTSGAESGNLNTS PSSNQT 
RNSEKFEKPENEIEAQLICEPPIKGSSTPNPKIASSVTAGVASSLSEKIADSIGNNRQ 
NAPLTSIQIRFIQNMIQETLDDFREACHRDIVNLQVEMIKQFHMQLNEMHSLLERYSV 
NEGLVAE I ERLREENKRLRAHF 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 55B. 



Table 55B. Comparison of NOV55a against NOV55b. 


Protein Sequence 


NOV55a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV55b 


1.659 
1..660 


658/660 (99%) 
659/660 (99%) 



Further analysis of the NOV55a protein yielded the following properties shown in 
Table 55C. 



Table 55C. Protein Sequence Properties NOV55a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV55a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 55D. 



Table 55D. Geneseq Results for NOVSSa 


Geneseq 
Identifier 


Protein/Organ ism/Length [Patent #, 
Date] 


NOVSSa 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG74568 


Human colon cancer antigen protein 
SEQ ID NO:5332 - Homo sapiens, 
404 aa. [WO2001 22920- A2, 05-APR- 
2001] 


256..6S9 
L.404 


399/404 (98%) 
399/404 (98%) 


0.0 


AAE10677 








4e-75 
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Homo sapiens, 208 aa. 
[WO200172955-A2, 04-OCT-2001] 


2..159 


149/159 (93%) 




AAM70774 


Human bone marrow expressed probe 
encoded protein SEQ ID NO: 31080 - 
Homo sapiens, 67 aa. 
[WO200157276-A2, 09-AUG-2001] 


240..306 
1..67 


67/67 (100%) 
67/67 (100%) 


9e-31 


AAM06190 


Peptide #4872 encoded by probe for 
measuring breast gene expression - 
Homo sapiens, 67 aa. 
[WO200157270-A2, 09-AUG-2001] 


240..306 
1..67 


67/67 (100%) 
67/67 (100%) 


9e-31 


ABB23122 


Protein #5121 encoded by probe for 
measuring heart cell gene expression - 
Homo sapiens, 65 aa. 
[WO2001 57274-A2, 09-AUG-2001 ] 


307..371 
1..65 


65/65(100%) 
65/65 (100%) 


3e-29 


In a BLAST search of public sequence databases, the NOV55a protein was found to 


have homology to the proteins shown in the BLASTP data in Table 55E. 




Table 55E. Public BLASTP Results for NOVS5a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV55a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


160167 


regulatory protein Neddl - mouse, 660 
aa. 


1..659 
1..660 


564/660(85%) 
607/660 (91%) 


0.0 


P33215 


NEDD1 protein - Mus musculus 
(Mouse), 675 aa (fragment). 


1..659 
16..675 


564/660 (85%) 
607/660 (91%) 


0.0 


Q9CWK2 


NEURAL PRECURSOR CELL 
EXPRESSED, 

DEVELOPMENTALLY DOWN- 
REGULATED GENE 1 - Mus 
musculus (Mouse), 660 aa. 


1..659 
1..660 


563/660 (85%) 
606/660 (91%) 


0.0 


Q9FI89 


SIMILARITY TO REGULATORY 
PROTEIN NEDD1 - Arabidopsis 
thaliana (Mouse-ear cress), 787 aa. 


8..S33 
15..532 


145/550(26%) 
246/550 (44%) 


4e-40 


BAB75165 


WD-40 REPEAT PROTEIN - 
Anabaena sp. (strain PCC 7120), 1526 
aa. 


2..298 
916..1208 


92/307 (29%) 
147/307 (46%) 


2e-18 



PFam analysis predicts that the NOV55a protein contains the domains shown in the 
Table 55F. 
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Table 55F. Domain Analysis of NOV55a 


Pfam Domain 


NOVSSa Match Repinn 


Identities/ 
Similarities 
for the Matched Region 


Kxnect Value 


WD40- domain 1 of 7 


28 61 


6/37 (16%) 
21131 (73%) 


57 


WD40* domain 2 of 7 


70 105 


10/37 (27%"J 
27/37 (73%) 


0 062 


WD40* domain 1 of 7 


111 1 47 


9/37 (24%) 
28/37 (76%) 


20 


Wn40" domain A of 7 


1S1 190 


1 0/38 (26%} 

i \Jf J o y£,\j /o j 

29/38 (76%) 


3 4 


WD40' domain 5 of 7 


197..234 


7/38(18%) 
25/38 (66%) 


19 


WD40:domain6of7 


240..275 


14/37 (38%) 
28/37(76%) 


3.1 


WD40: domain 7 of 7 


282..316 


8/37 (22%) 
26/37 (70%) 


1.3e+03 



Example 56. 

The NOV56 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 56A. 



Table 56A. NOV56 Sequence Analysis 




SEQ ED NO: 167 


1771 bp 


NOV56a, 

CG59439-01 DNA Sequence 


GACTGTTTCACCATGCAGTGGCTAATGAGGTTCCGGACCCTCTGGGGCATCCACAAAT 
CCTTCCACAACATCCACCCTGCCCCTTCACAGCTGCGCTGCCGGTCTTTATCAGAATT 
TGGAGCCCCAAGATGGAATGACTATGAAGTACCGGAGGAATTTAACTTTGCAAGTTAT 
GTACTGGACTACTGGGCTCAAAAGGAGAAGGAGGGCAAGAGAGGTCCAAATCCAGCTT 
TTTGGTGGGTGAATGGCCAAGGGGATGAAGTAAAGTGGAGCTTCAGAGAGATGGGAGA 
CCTAACCCGCCGTGTAGCCAACGTCTTCACACAGACCTGTGGCCTACAACAGGGAGAC 
CATCTGGCCnTGATGCTGCCTCGAGTTCCTCAGTGGTGGCTGGrGGCTGTGGGCTGCA 
TGCGAACAGGGATC ATCTTCATTCCTGCGACCATCCTGTTG AAGG C CAAAG ACATTCT 
CT AT CGACTACAGTTGTCT AAAGCCAAGGGC ATTGTG ACCATAG ATG CC CTTGCCTCA 
GAGGTGGACTCCATAGCTTCTCAGTGCCCCTCTCTGAAAACCAAGCTCCTGGTGTCTG 
AT CAC AG CCGTG AAGGG TGG CTGG ACTTCCG ATCG CTGG TT AAAT C AGCATCC CCAG A 
ACACACCTCTGTTAAGTCAAAGACCTTGGACCCAATGGTCATCTTCTTCACCAGTGGG 
ACCACAGGCTTCCCCAAGATGGCAAAACACTCCCATGGGTTGGCCTTACAACCCTCCT 
TCCCAGGAAGTAGGAAATTACGG AGCCTG AAGACATCTGATGTCT C CTGGTGCCTGTC 
GGACTCAGGATGGATTGTGGCTACCATTTGGACCCTGGTAGAACCATGGACAGCGGGT 
TGTACAGTCTTTATCCACCATC1X3CCACAGTTTGACACCAAGGTCATCATACAGACAT 
TGTTGAAATACCCCATTAACCACTTTTGGGGGGTATCATCTATATATCGAATGATTCT 
G CAG CAGG ATTTC ACCAG CAT CAGGTTCCCTGC C CTGG AG CA CTG CT AT A CTGG CGGG 
GAGGTCGTGTTGCCCAAGGATCAGGAGGAGTGGAAAAGACGGACGGGCCTTCTGCTCT 
ACGAG AACT ATGGG C AGTCGGAAACGGGACTAATTTGTG CCACCTACTGGGG AATGAA 
GATCAAGCCGGGTTTCATGGGGAAGGCCACTCCACCCTACGACGTCCAGGTCATTGAT 
G A C AAGGG CAG CATC CTG C CAC CTAACACAG AAGG AAACATTGG CAT CAG AATCAAA C 
CTGTCAGGCCTGTGAGCCTCTTCATGTGCTATGAGGGTGACCCAGAGAAGACAGCTAA 
AGTGGAATGTGGGGACTTCTACAACACTGGGGACAGAGGTAAGATGGATGAAGAGGGC 
TACATTTGTTTCCTGGGGAGGAGTGATGACATCATTAATGCCTCTGGGTATCGCATCG 
GGCCTGCAGAGGTTGAAAGCGCTTTGGTGGAGCACCCAGCGGTGGCGGAGTCAGCCGT 
GGTGGGCAGCCCAGACCCGATTCGAGGGGAGGTGGTGAAGGCCTTTATTGTCCTGACC 
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CCACAGTTCCTGTCCCATGACAAGGATCAGCTGACCAAGGAACTGCAGCAGCATGTCA 
AGTC AGTGACAG CCCCATACAAGTACC CAAGGAAGGTGGAGTTTGTCTCAGAGCTGCC 
AAAAACCATCACTGGCAAGATTGAACGGAAGGAACTTCGGAAAAAGGAGACTGGTCAG 
ATGT AATCGG C AGTG AACTCAGAACGCACTG 




ORF Start: ATGat 13 


ORF Stop: TAAat 1744 




SEQIDNO: 168 


577 aa 


MW at 65272.6kD 


NOV56a, 

CG5 9439-01 Protein Sequence 


MQWLMRFRTLWGIHKSFHNIHPAPSQLRCRSLSEFGAPRWNDYEVPEEFNFASYVLDY 
WAQKEKEGKRGPNPAFWWWGQGDEVKWSFREMGDLTRRVANVFTQTCGLQQGDHLAL 
MLPRVPEWWLVAVGCMRTGI I F I PAT I LLKAKDI LYRLQLSKAKG IVTI DALAS EVDS 
I AS QC PS LKT KLL VS DHS REGWLDFRS LVKS AS PEHTCVKS KTLD PMVI FFTS GTTGF 
PKMAKHSHGLAIiQPSFPGSRKLRSLKTSDVSWCLSDSGWIVATIWTLVEPWTAGCTVF 
I HHL PQFDTKV 1 1 QT LLKYP I NH FWG VS S I YRMI LQQD FT S I RFP ALEHC YTGG E WL 
PKDQEEWKRRTGLLL YENYGQ S E TGLI CAT YWGMK I KPG FMG KAT P P YD VQ VI DDKGS 
ILPPNTEGNIGIRIKPVRPVSLFMCYEGDPEKTAKVECGDFYNTGDRGKMDEEGYICF 
LGRSDDIINASGYRIGPAEVESALVEHPAVAESAWGSPDPIRGEWKAFIVLTPQFL 
SHDKDQLTKELQQHVKS VT AP YKY PRKVE FVSE L PKT I TGK I E RKE LRKKE TGQM 




SEQIDNO: 169 


1659 bp 


NOV56b, 

CG59439-02 DNA Sequence 


GTTTCACCATGCAGTGGCTAATGAGGTTCCGGACCCTCTGGGGCATCCACAAATCCTT 
CCACAACATC^CCCTGCCCCTTCACAGCTCCGCTGCCGGTCTTTATCAGAATTTGGA 
GCCCCAAGATGGAATGACTATGAAGTACCGGAGGAATTTAACTTTGCAAGTTATGTAC 
TGGACTACTGGGCTCAAAAGGAGAAGGAGGGCAAGAGAGGTCCAAATCCAGCTTTTTG 
GTGGGTGAATGGCCAAGGGGATGAAGTAAAGTGGAGCTTCAGAGAGATGGGAGACCTA 
ACCCGCCGTGTAGCCAACGTCTTCACACAGACCTGTGGCCTACAACAGGGAGACCATC 
TX^CCTTGATGCTGCCTCXSAGTTCCTGAGTGGTGGCTGGTGGCTGTGGGCTGCATGCG 
AACAGGG ATCATCTTCATTCCTG CG ACC ATCCTGTTGAAGGC CAAAG ACATTCTCTAT 
CGACTACAGTTGTCTAAAGCCAAGGGCATTGTGACCATAGATGCCCTTGCCTCAGAGG 
TGGACTCCATAGCTTCTCAGTGCCCCTCTCTGAAAACCAAGCTCCTGGTGTCTGATCA 
C AGCCGTG AAGGGTGGCTGGACTTC CG ATCGCTGGTTAAAT CAGCATCCCCAGAACAC 
AC CTGTGTT AAGTC AAAGACCTTGG AC CCAATGGTC ATCTT CTTCACCAGTGGGACCA 
CAGGCTTCCCCAAGATGGCAAAACACTCCCATGGGTTGGCCTTACAACCCTCCTTCCC 
AGGAAGTAGGAAATTACGGAGCCTGAAGACATCTGATGTCTCCTGGTGCCTGTCGGAC 
TCAGGATGGATTGTGGCTACCATTTGGACCCTGGTAGAACCATGGACAGCGGGTTGTA 
CAGTCTTTATCCACCATCTGCCACAGTTTGACACCAAGGTCATCATACAGACATTGTT 
GAAATACCCCATTAACCACTTTTGGGGGGTATCATCTATATATCGAATGATTCTGCAG 
CAGG ATTTC ACCAGCATCAGGTTCCCTGCC CTGG AG CACTGCT ATACTGGCGGGGAGG 
TCGTGTTGCCCAAGGATCAGGAGGAGTGGAAAAGACGGACGGGCCTTCTGCTCTACGA 
GAACTATGGGCAGTCGGAAACGGGACTAATTTGTGCCACCTACTGGGGAATGAAGATC 
AAGCCGGGTTTCATGGGGAAGGCCACTCCACCCTACGACGTCCAGGGTGACCCAGAGA 
AGACAGCTAAAGTGGAATGTGGGGACTTCTACAACACTGGGGACAGAGGAAAGATGGA 
TGAAGAGGGCTACATTTGTTTCCTGGGGAGGAGTGATGACATCATTAATGCCTCTGGG 
TATCGCATCGGGCCTGCAGAGGTTGAAAGTGCTTTGGTGGAGCACCCAGCGGTGGCGG 
AGTCAGC CGTGGTGGGCAG CCCAG ACCCGATTCG AGGGG AGGTGGTGAAGGC CTTTAT 
TGTCCTGAC CCC AC AGTTC CTGTC CCATG ACAAGG ATCAG CTG ACCAAGGAACTGCAG 
CAG CATGTCAAGTC AGTGACAGCC CCAT ACAAGT ACCCAAGGAAAGTGG AGTTTGTCT 
CAGAGCTGCCAAAAACCATCACTGGCAAGATTGAACGGAAGGAACTTCGGAAAAAGGA 
GACTGGTCAGATGTAATCGGCAGTGAACTCAGAAC 




ORF Start: ATG at 9 


ORF Stop: TAA at 1638 




SEQIDNO: 170 


543 aa 


MW at61518.2kD 


NOV56b, 

CG59439-02 Protein Sequence 


MQWLMRFRTLWGIHKSFHNIHPAPSQLRCRSLSEFGAPRWNDYEVPEEFNFASYVLDY 
WAQKEKEGKRGPNPAFWWVNGQGDEVKWSFREMGDLTRRVANVFTQTCGLQQGDHIiAL 
MLPRVPEWWLVAVGCMRTGI I FI PAT I LLKAKD I LYRLQLS KAKG I VTIDALAS EVDS 
IASQCPSLKTKLLVSDHSREGWLDFRSLVKSAS PEHTCVKS KTLDPMVIFFTSGTTGF 
PKMAKHSHGLALQPSFPGSRKLRSLKTSDVSWCLSDSGWIVATIWTLVEPWTAGCTVF 
IHHLPQFDTKVIIQTLLKYPINHFWGVSSIYRMILQQDFTSIRFPALEHCYTGGEWL 
PKDQEEWKRRTGLLLYENYGQSETGLICATYWGMKIKPGFMGKATPPYDVQGDPEKTA 
KVECGDFYNTGDRGKMDEEGYICFLGRSDDIINASGYRIGPAEVESALVEHPAVAESA 
WGSPDPIRGEWKAFIVLTPQFLSHDKDQLTKELQQHVKSVTAPYKYPRKVEFVSEL 
PKT I TGKI ERKE LRKKETGQM 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 56B. 
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Table 56B. Comparison of NOV56a against NOV56b. 


Protein Sequence 


NOV56a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV56b 


1..577 | 
1..543 


543/577 (94%) 
543/577 (94%) 



Further analysis of the NOV56a protein yielded the following properties shown in 
Table 56C. 



Table 56C Protein Sequence Properties NOV56a 


PSort 
analysis: 


0.6400 probability located in microbody (peroxisome); 0.4712 probability located 
in mitochondrial matrix space; 0.1737 probability located in mitochondrial inner 
membrane; 0.1737 probability located in mitochondrial intennembrane space 


SignalP 
analysis: 


Likely cleavage site between residues 23 and 24 



A search of the NOV56a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 56D. 



Table 56D. Geneseq Results for NOV56a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV56a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB43245 


Human ORFX ORF3009 polypeptide 
sequence SEQ ID NO:6018 - Homo 
sapiens, 537 aa. [WO200058473-A2, 
05-OCT-2000] 


46..573 
1..527 


309/529 (58%) 
402/529 (75%) 


0.0 


AAM8O0O8 


Human protein SEQ ID NO 3654 - 
Homo sapiens, 302 aa. 
[WO200157190-A2, 09-AUG-2001] 


331..577 
24..302 


247/279 (88%) 
247/279 (88%) 


e-140 


AAM800O7 


Human protein SEQ ID NO 3653 - 
Homo sapiens, 302 aa. 
[WO200157190-A2, 09-AUG-2001] 


331..577 
24..302 


247/279 (88%) 
247/279 (88%) 


e-140 


AAM41894 


Human polypeptide SEQ ID NO 
6825 - Homo sapiens, 390 aa. 
[WO200153312-A1, 26-JUL-2001] 


257..573 
7..323 


193/317(60%) 
246/317(76%) 


e-116 


AAM79024 








e-112 
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Homo sapiens, 196 aa. 


1..196 


196/196(100%) 






[WO200157190-A2, 09-AUG-2001] 









In a BLAST search of public sequence databases, the NOV56a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 56E. 



Table 56E. Public BLASTP Results for NOV56a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV56a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
value 


Q96A20 


MIDDLE-CHAIN ACYL-COA 
SYNTHETASE1 (MEDIUM-CHAIN 
ACYL-COA SYNTHETASE) - Homo 
sapiens (Human), 577 aa. 


1..577 
1..577 


576/577 (99%) 
576/577 (99%) 


0.0 


Q9TVB5 


XENOBIOTIC/MEDIUM-CHAIN 

XL-III PRECURSOR - Bos taurus 
(Bovine), 577 aa. 


1..576 

1 

/O 


439/576 (76%) 

*tOO/-> /O ^o*f /o) 


0.0 


Q9BEA2 


LIPOATE-ACTIVATING ENZYME 
PRECURSOR - Bos taurus (Bovine), 
577 aa. 


1..576 
1..576 i 


438/576 (76%) 
485/576 (84%) 


0.0 


Q91VA0 


MEDIUM-CHAIN ACYL-COA 
SYNTHETASE (EC 6.2.1.2) 
(HYPOTHETICAL 64.8 KDA 
PROTEIN) - Mus musculus (Mouse), 
573 aa. 


1..577 
1..573 


406/577 (70%) 
472/577 (81%) 


0.0 


070490 


KIDNEY-SPECIFIC PROTEIN - 
Rattus norvegicus (Rat), 572 aa. 


1..573 
1..567 


315/580(54%) 
417/580 (71%) 


0.0 



PFam analysis predicts that the NOV56a protein contains the domains shown in the 
Table 56F. 



Table 56F. Domain Analysis of NOV56a 


Pfam Domain 


NOV56a Match 
Region 


Identities/ 
Similarities 


Expect 
Value 
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] Region 




AMP-binding: domain 1 of 
1 


87..499 106/425 (25%) 
J 299/425 (70%) 


2.5e-96 



Example 57. 

The NOV57 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 5 7 A. 



Table 57A« NOV57 Sequence Analysis 




SEQIDNO: 171 


2501 bp 


NOV57a, 

CG59354-01 DNA Sequence 


ACACCATOACCACCCTTGATGATAAGTTGCTGGGGGAGAAACTGCAGTACTACTATAG 
CAGCAGTGAGG ATGAGGACAGTGACCACGAGGAC AAGGACCGAGG CAGATGTG CC CCA 
GCCAGCAGTTCTGTGCCTGCAGAGGCTGAGCTGGCAGGCGAAGGCATCTCAGTTAACA 
CAATGACTCTGAAGGAGTTTGCCATAATGAATGAGGACCAAGATGATGAAGAGTTTCT 
GCAGCAGTACCGGAAGCAGCGAATGGAAGAGATGCGGCAGCAGCTTCACAAGGGGCCC 
CAATT CAAG CAGG TTTTTG AG AT CT CCAG TGG AG AAG GG TTTT TAG A CATGATTG AT A 
AAG AACAG AAAAG CATTGT CATCATGG TT CAT ATTT ATG AGG ATGG CATT C CAGGG AC 
CGAAGCCATGAATGGTTGCATGATCTGCCTTGCCG CAG AGTAC CCAGCTGTCAAGTT C 
TGCAAGGTGAAGAGCTCAGTTATTGGCGCCAGCAGTCAGTTCACCAGGAATGCCCTTC 
CTCCCCTGCTGATCTATAAGGGGGGTGAATTGATCGGa^TTTTGTTCGTGTTACTGA 
CCAGCTGG<5G^ATGATTTCTTTGCTGTGGACCTTGAAGCTTTTCTCCAC^AATTTGGA 
TTACTCCCAGAAAAGGAAGTCTTGGTGCTCACATCTGTGCGTAACTCTGCCACGTGTC 
ACAGTGAGG ATAG CGACCTGG AAATAG ATTGAACTGATAGT CTAGTTGCATAG ATTTC 
TCATTGTTTGGGTTGGAATACACGTCATTGTTTATTTTTGTTCCTTTGTCTTCTGGCT 


TTTCAGCTGTTCTTTGTAGTCCCTTTTATTATGCATAAAATAAAGAAATTCTTAGATT 


AAATCAGAATGCTGAATAACCTTGTAGCTAGCAATAAGGTGACTTACAATTGTATAAA 


CAGGAAGCCAGGCTTTTGAACTGTTTACTTAAGATTCTGTGGTGTGACATCTCTGTTA 


TTGTTTCCAGTCAATATTTACAAAGCATCCTAAAGACAGGGTCTTGGAAATTGTCTTC 


AGATGATCTTAGAGGTCTCTGCCAAGTCTGAGAGTATAATTCTGTAGGTATTGTGTTA 


TTTG CAACG T AAAT AGTGC ATTTTCTT AAT C AAATG ATTGT AAATT AT ATTT ACTTG T 




TTTAAAGGGGTGAGCCACCGCACCCAGTCCTGAGGGGTGGCCTCTGCTCCTGGATTTC 


ATGTCTTCCTCCAGCATGACTAAGTCTGGAACAGCAGGAAGGGTTGATGCTTACTGAC 


CTGGTGATGTTAGAAGACAAGTAGTTTATGGATTTAAACATTAGAGCTGGAGTGGGGC 


TGGAAATCTTTGTAAAG^AAGTTCTTTCAGTAAGATGCCCCTGCTTGTCTTTGTCTCT 


TTTTTGTTTAACAAGGTAACTTTTTGTTTAACAAGGTAACTTTTTGTTTAACCTAGAT 


TTTTTTTAAAACTTTTri u i ri TTTTCATATTGGAAAAGTAATTCATATTCAGTAGAGG 


AAAACTCACCAAAACAGAAGCAAAAATAAGAAAATTAAAATAATCTCTAATCCTACTA 


CCTAGAATAAAACACTATTAATATTTTGGTCTGTTTCCTGCCAAGGTGTTTTCTGTGT 


ATACATGGATATTTTGTTTGTTTTTAAACAAAACGATGGGATCATTCTGAACATACTG 


TT CTAT AGT ATGGT CAGCTAAT AATATATCAGACCTTTTTTTTATATT ATTAAATATT 


CTACAACTTTTTAAAAATGTCTATTAATATTCCATCGTATAGATGTGATATAATTTGC 


TTGATGGTTGTCTCITAAAAAGAAAGATAGCAAATACTTTTTTTAAATTACAAAAGTG 


ATAGATGTTCATTGTAGAAAATGTAATAAACACTGTTAAGACTTAAAAGCCATATAAT 


TCCACCAACCAAAATTAATCCCTTTTGTCATATTTCTAGTCATTTTTATAGCCTTTTT 


TTTCTATGTATTTATAATAATTATCATTTGCGTTTTTTTCCTTTTTTTAACTTTAAAA 


ATGTATATTCTAGGGTCAGGGGAAATGTAATCTGGAATTAAATATTAGCCTTAAAATT 


CACAATTTTGATTTTCCTGGCTTTTCAGGAATTGACTAACTGTAAAAGAGTCTTGAAA 




G T AG AAAGGGTGG AATGT ATTG AAAATT ATT AG AAG C AG GG AAGT ATTG TT AG T CT AG 


CTTATTTCCTTTCAGTCTTTTTTCAATATTTTT 


TAGTTCTGTGCTCTTCCTTATTTAGTGTTGTATCATAAATACTTTGATGTTTCAAACA 


TT CT AAAT AAAT AATTTTC AG TG GCTT CAT AAT AAAAAAAAAAAAAAAAAAAAAAAAA 


AAAAAAA 




ORF Start: ATG at 6 


ORF Stop: TGA at 726 




SEQIDNO: 172 


240 aa MW at 26866.9kD 


NOV57a, 

CG59354-01 Protein Sequence 


MTTLDDKLLGEKLQYYYSSSEDEDSDHEDKDRGRCAPASSSVPAEAELAGEGISVNTM 
TLKEFAIMNEDQDDEEFLQQYRKQRMEEMRQQLHKGPQFKQVFEISSGEGFLDMIDKE 
QKSIVIMVHIYEDGIPGTEAMNGCMICLAAEYPAVKFCKVKSSVIGASSQFTRNALPA 
LLIYKGGELIGNFVRVTDQLGDDFFAVDLEAFLQEFGLLPEKEVLVLTSVRNSATCHS 
EDSDLEID 




SEQIDNO: 173 


893 bp 


NOV57b, 


CACCATGACCAC CCTGTATGAT AAGTTG CTGGGGGAG AAACTGCAGT ACTACT AT AGC 
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CG59354-02 DNA Sequence 


AGCAGTGAAGATGAGGACAGTGACCACGAGGACAAGGACCGAGGCATCTCAGTTAACA 

CAGGCCCAAAAGGTGTGATCAATGACTGGCGCCGCTTCAAGCAGTTGGAGACAGAGCA 

G AGGGAGGAGCAGTG CCGGGAG ATGGAAAGGCTGATCAAG AAG CTGTCAATG ACTTGC 

AGGTCCCATCTGGATGAAGAGGAGGAGCAACAGAAACAGAAAGACCTCCAGGAGAAGA 

T C AGTG GGAAG ATG ACT C TG AAGG AG TTTG C CAT AATG AATG AGG ACCAAG ATG ATG A 

AG AGTTTCTG CAG CAG T AC CGG AAG CAG CG AATGGAAG AGATG CG G CAG CAG CTTC AC 

AAGGGGCCCCAATTCAAGCAGGTTTTTGAGATCTCCAGTGGAGAAGGGTTTT/TAGACA 

TGATTGATAAAGAACAGAAAAGCATTGTCATCATGGTTCATATTTATGAGGATGGCAT 

CAGGG ACCG AAG CCATG AATGGTTG C ATG ATC CG C CTTG CAAGG G GGGTG AATTG AT C 

GGCAATTTTGTTCGTGTTACTGACCAGCTCGGGGATX3ATTTCTTTGCT 

AAGCTTTTCTCCAGGAATTTGGATTACTCCCAGAAAAGGAAGTCTTGGTGCTGACATC 

TGTGCGTAACTCTGCCACGTGTCACAGTGAGGATAGCGACCTGGAAATAGATTQAACT 

GATAGTCTAGTTCC^TATAGATTTCTCATTGTTTGGGTTGGAATACACCATTGTTTAT 


TTTTGTTCCTTTGTCTTCTGGCTTTTCAGCTGTTCTTTGTAGTCCCTTTTATTATGCA 


TAAAATAAAGAAATTCTTAGATT 




ORF Start: ATG at 5 


ORF Stop: TGA at 749 




SEQIDNO: 174 


248 aa 


MW at 29227.4kD 


NOV57b, 

CG59354-02 Protein Sequence 


MTTLYDKLLGEKLQYYYSSSEDEDSDHEDKDRGISVNTGPKGVINDWRRFKQLETEQR 
EEQCREMERLI KKLSMTCRSHLDEEEEQQKQKDLQEKI SGKOTLKEFAIMNEDQDDEE 
FLQQYRKQRMEEMRQQLHKGPQFKQVFEISSGEGFLDMIDKEQKSIVIMVHIYEDGIR 
DRSHEWLHDPPCKGGELIGNFVRVTDQIiGDDFFAVDLEAFLQEFGLLPEKEVLVLTSV 
RNSATCHSEDSDLEID 




SEQIDNO: 175 


891 bp 


NOV57c, 

CG59354-03 DNA Sequence 


CACCATGACCACCCTGTATG ATAAGTTG CTGGGGGAGAAACTG CAGTACTACTATAGC 
AGCAGTGAAGATGAGGACAGTGACCACGAGGACAAGGACCGAGGCATCTCAGTTAACA 
CAGGCCCAAAAGGTGTGATCAATGACTGGCGCCGCTTCAAGCAGTTGGAGACAGAGCA 
GAGGGAGGAGCAGTGCCGGGAGATGGAAAGGCTGATCAAGAAGCTGTCAATGACTTGC 
AGG TC C CAT CTGG ATG AAG AGG AGGAG CAACAGAAACAGAAAG A CCTCCAGG AG AAG A 
TCAGTGGGAAGATGACTCTGAAGGAGTTTGCCATAATGAATGAGGACCAAGATGATGA 
AG AGTTTCTGCAGCAGT ACCGGAAGCAG CG AATGG AAGAGATGCGGCAGC AGC TTCAC 
AAGGGGCCCCAATTCAAGCAGGTTTTTGAGATCTCCAGTGGAGAAGGGTTTTTAGACA 
TGATTGATAAAGAACAGAAAAGCATTGTCATCATGGTTCATATTTATGAGGATGGCAT 
TCCAGGGACCGAAGCCATGAATGGTTGCATGATCCGCCTTGCCGCAGAGTACCCAGCT 
GTCAAGTTCTGCAAGGTGAAGAGCTCAGTTATTGGCGCCAGCAGTCAGTTCACCAGGA 
ATG CC CTTCCTGCCCTG CTG ATCTATAAGGGGGGTGAATTGATCGG CAATTTTGTTCG 
TGTTACTGACCAGCTGGGGGATGATTTCriTGCTOTGGACCTTGAAGCTTTTCTCCAG 
GAATTTGGATTACTCCCAGAAAAGGAAGTCTTGGTGCTGACATCTGTGCGTAACTCTG 
CCACGTGTCACAGTCAGGATAGCGACCIXX5AAATAGATTGAACTGATAGTCTAGTTGC 
ATAGATTTCTCATTGTTTGGG 




ORF Start: ATG at 5 


ORF Stop: TGA at 851 




SEQ ID NO: 176 


282 aa 


MW at 32598.5kD 


NOV57c, 

CG59354-03 Protein Sequence 


MTTLYDKLLGEKLQYYYSSSEDEDSDHEDKDRGISVNTGPKGVINDWRRFKQLETEQR 
EEOCREMERLIKKLSMTCRSHLDEEEEQQKQKDLQEKISGKMTLKEFAIMNEDQDDEE 
FLQQYRKQRMEEMRQQLHKGPQFKQVFEISSGEGFLDMIDKEQKSIVIMVHIYEDGIP 
GTEAMNGCMIRLAAEYPAVKFCKVKSSVIGASSQFTRNALPALLIYKGGELIGNFVRV 
TDQIiGDDFFAVDLEAFLQEFGLLPEKEVLVLTSVRNSATCHSEDSDLEID 



Sequence comparison of the above protein sequences yields the following sequence 
- relationships shown in Table 57B. 
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Table 57B. Comparison of NOV57a against NOV57b through NOV57c. 


Protein Sequence 


NOV57a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV57b 


58..240 
100..248 


138/183 (75%) 
140/183 (76%) 


NOV57c 


58..240 
100..282 


182/183 (99%) 
182/183 (99%) 



Further analysis of the NOV57a protein yielded the following properties shown in 
Table 57C. 



Table 57C. Protein Sequence Properties NOV57a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.1000 probability located in 
mitochondrial matrix space; 0.1000 probability located in lysosome (lumen); 
0.0000 probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV57a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 57D. 



Table 57D. Geneseq Results for NOV57a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV57a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAE03537 


Human secreted protein variant, SEQ 
ID NO: 228 - Homo sapiens, 301 aa. 
[WO200132675-A1, 10-MAY-2001] j 


1..240 
1..301 


226/301 (75%) 
230/301 (76%) 


e-117 


AAY99657 


Human GTPase associated protein-8 - 
Homo sapiens, 301 aa. 
[WO20003 1 263- A2, 02-JUN-2000] 


1..240 
1..301 


226/301 (75%) 
230/301 (76%) 


e-117 


AAE02004 


Fruitfly viral IAP-associated factor 
(VIAF) - Drosophila melanogaster, 
240 aa. [WO200134798-A1, 17- 
MAY-2001] 


55..214 
59..213 


52/161 (32%) 
86/161 (53%) 


3e-14 


AAE02003 








5e-13 
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(VIAF) - Brachydanio rerio, 239 aa. 
[WO200134798-A1, 17-MAY-2001] 


2..237 


117/241 (47%) 




AAE02002 


Mouse viral IAP-associated factor 
(VIAF) - Mus musculus, 240 aa. 
[WO200134798-A1, 17-MAY-2001] 


58..240 
52..240 


59/195 (30%) 
99/195 (50%) 


4e-12 



In a BLAST search of public sequence databases, the NOV57a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 57E. 



Table 57E. Public BLASTP Results for NOV57a 


Protein 
Accession 
Number 


t^oiein/urganisnv jucngin 


NOV57a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96AF1 


HYPOTHETICAL 34.3 KDA 
PROTEIN - Homo sapiens 
(Human), 301 aa. 


1..240 
1..301 


226/301 (75%) 
230/301 (76%) 


e-117 


Q13371 


Phosducin-like protein (PHLP) - 
Homo sapiens (Human), 301 aa. 


1..240 
1..301 


225/301 (74%) 
230/301 (75%) 


e-116 


T17321 


hypothetical protein 
DKFZp564M1863.1 - human, 301 
aa. 


1..240 
1..301 


225/301 (74%) 
230/301 (75%) 


e-116 


Q923E8 


RIKEN CDNA 120001 1E13 
GENE - Mus musculus (Mouse), 
301 aa. 


1..240 
1..3.01 


210/301 (69%) 
223/301 (73%) 


e-109 


Q63737 


Phosducin-like protein (PHLP) - 
Rattus norvegicus (Rat), 301 aa. 


1..240 
1..301 


210/301 (69%) 
223/301 (73%) 


e-108 



PFam analysis predicts that the NOV57a protein contains the domains shown in the 
Table 57F. 
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Table 57F. Domain Analysis of NOV57a 


Pfam Domain 


NOV57a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


Phosducin: domain 1 of 2 


35..57 


14/23 (61%) 
21/23 (91%) 


8.7e-08 


Phosducin: domain 2 of 2 


58..240 


133/183 (73%) 
174/183 (95%) ! 


9.7e-148 



Example 58. 

The NOV58 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 58A. 



Table 58A. NOV58 Sequence Analysis 




SEQIDNO:177 j 


756 bp 


NOV58a, 

CG593 19-01 DNA Sequence 


G OATCCCAATG AAG AT ACAG AATGG AATG ACATTT T AAG AG ATTT CGG C ATT CTT C CT 
CCTAAAGAAGAGTCAAAAGATGAAATTGAAGAAATGGTTTTACGTTTACAGAAAGAAG 
CAATGGTGAAACCATTTGAAAAGATGACTCTTGCACAGCTAAAGGAAGCTGAAGATGA 
ATTCGATGAAGAAGATATGCAGGCTGTTGAAACATATAGAAAGAAGCGGTTACAGGAA 
TGGAAAGCTCTTAAGAAAAAACAAAAATTTGGAGAATTAAGAGAAATTTCTGGAAATC 
AGTATGTGAATGAAGTCACAAATGCAGAAGAAGATGTGTGGGTTATAATTCATCTATA 
CAGATCAAGCATCCCAATGTGTTTGTTGGTTAAC CAGCATCTT AGTCTTCT AG CAAG A 
AAGTTTCCAGAAACT AAATTTGTTAAAG C CATCGTG AATAG CTGT ATTC AACACTACC 
ATGACAATTGTTTACCAACAATTTTTGTGTATAAAAATGGTCAGATAGAAG C C AAATT 
CATTGGAATT ATAGAATGTGG AGGGATAAAT CTCAAG CTGG AAGAACTTG AATGGAAG 
CTAGCAGAAGTTGGAGCAATACAGACTGATTTGGAAGAAAACCCCAGAAAAGACATGG 
TAGATATGATGGTATCTTCAATTAGAAACACTTCTATTCATGATGACAGTGATAGCTC 
CAACAGTGATAATGATACCAAATAGAGAGAAATATTCAATAAATAGCTTTTAGTAAAA 




AA 








ORF Start: GAT at 2 


ORF Stop: TAG at 719 




SEQIDNO:178 


239 aa 


MWat27811.3kD 


NOV58a, 

CG59319-01 Protein Sequence 


DPNEDTEWNDI LRDFGI LPPKEESKDEI E EMVLRLQKEAMVK P FE KMTLAQ LKEAEDE 
FDEEDMQAVETYRKKRLQEWKALKKKQKFGELREI SGNQYVNEVTNAEEDVWVI IHLY 
RSS I PMCLLVNQHLSLIARKFPETKFVKAI VNSCIQHYHDNCLPTI FVYKNGQI EAKF 
I G 1 1 E CGG I NLKLE E LE WKLAE VG A I QTDLE EN PRKDMVDMMVS S I RNTS I HDDS DS S 
NSDNDTK 




SEQIDNO: 179 


745 bp 


NOV58b, 

CG593 19-02 DNA Sequence 


GG AT CCCAATG AAG ATACAG AAT GGATCCCAATG AAG AT ACAG AATGGAATG ACATTT 


TAAGAGATTTCGGCATTCTTCCTCCTAAAGAAGAGTCAAAAGATGAAATTGAAGAAAT 
GGTTTTACGTTTACAGAAAGAAGCAATGGTGAAACCATTTGAAAAGATGACTCTTGCA 
CAG CTAAAGGAAGCTG AAGATG AATTTGATG AAGAAGAT ATG C AGGCTGTTG AAACAT 
ATAGAAAGAAGCGGTTACAGG AATGG AAAGCTCTTAAGAAAAAACAAAAATTTGGAGA 
ATT AAG AGAAATTTCTGGAAAT CAGT ATG TG AATG AAGT CACAAATG CAG AAG AAG AT 
GTGTGGGTTATAATTCATCTATACAGATCAAGCATCCCAATGTGTTTGTTGGTTAACC 
AG CAT CTT AGT CTTCT AG CAAG AAAGTTT C C AG AAACT AAATTTG TT AAAG C C AT CGT 
GAATAGCTGTATTCAACACTACCATGACAATTGTTTACCAACAATTTTTGTGTATAAA 
AATGG T C AG AT AG AAGCCAAATTC ATTGG AATT AT AG AATG TGG AGG G AT AAATCTC A 
AGCTGGAAGAACTTGAATGGAAGCTAGCAGAAGTTGGAGCJUVTACAGACTGATTTGGA 
AGAAAACCCCAGAAAAGACATGGTAGATATGATGGTATCTTCAATTAGAAACACTTCT 
ATCCATGATGACAGTGATAGCTCCAACAGTGATAATGATACCAAATAGA 




ORF Start: ATG at 22 


ORF Stop: TAG at 742 




SEQIDNO: 180 


240 aa 


MW at 27942.5kD 
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NOV58b, 

CG593 19-02 Protein Sequence 



MDPNEDTEWNDI LRDFGI LPPKEESKDE I EEMVLRLQKEAMVKPFEKMTLAQLKEAED 
EFDEEDMQAVETYRKKRLQEWKALKKKQKFGELREISGNQYVNEVTNAEEDVWVIIHL 
YRSSIPMCLLVNQHLSLIARKFPETKFVKAIVNSCIQHYHDNCLPTIFVYKNGQIEAK 
FIGIIECGGINLKLEELEWKLAEVGAIQTDLEENPRKDMVDMMVSSIRNTSIHDDSDS 
SNSDNDTK 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 58B. 



Table S8B. Comparison of NOV58a against NOV58b. 


Protein Sequence 


NOV58a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV58b 


1..239 
2..240 


216/239(90%) 
216/239(90%) 



Further analysis of the NOV58a protein yielded the following properties shown in 
Table 58C. 



Table 58C Protein Sequence Properties NOVS8a 


PSort 
analysis: 


0.8800 probability located in nucleus; 0.1000 probability located in 
mitochondrial matrix space; 0.1000 probability located in lysosome (lumen); 
0.0000 probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV58a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 58D. 



Table 58D. Geneseq Results for NOV58a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV58a 
Residues/ 
Match 
Residues 


Identities/ 
' Similarities for 
the Matched 
Region 


Expect 
Value 


AAE02003 


Zebrafish viral IAP-associated 
factor (VIAF) - Brachydanio rerio, 
239 aa. [WO200134798-A1, 17- 
MAY-2001] 


1..237 
3..239 


133/238 (55%) 
181/238(75%) 


3e-75 


AAU27979 


Mouse contig polypeptide sequence 
#132 - Mus musculus, 243 aa. 
[WO200164834-A2, 07-SEP-2001] 


1..231 
7..240 


137/234 (58%) 
176/234 (74%) 


2e-74 
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AAU27807 


Human full-length polypeptide 
sequence #132 - Mus musculus, 
239 aa. [WO200164834-A2, 07- 
SEP-2001] 


1..231 
3..236 


137/234(58%) 
176/234 (74%) 


2e-74 


AAE02001 


Human viral IAP -associated factor 
(VIAF) - Homo sapiens, 239 aa. 
[WO200134798-A1, 17-MAY- 
2001] 


1..231 
3..236 


137/234 (58%) 
176/234 (74%) 


2e-74 


AAB68507 


Human GTP-binding associated 
protein #7 - Homo sapiens, 239 aa. 
[WO200105970-A2,25-JAN-2001] \ 


1..231 
3..236 : 


137/234 (58%) 
176/234 (74%) 


2e-74 



In a BLAST search of public sequence databases, the NOV58a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 58E. 



Table 58E. Public BLASTP Results for NOV58a 


Protein 
Accession 
Number 


Protein/Organ ism/Length 


NOV58a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for \ 
the Matched 
Portion 


Expect 
Value 


Q9CQU4 


1700010B22RIK PROTEIN - Mus 
musculus (Mouse), 240 aa. 


1.239 
3..240 


208/239(87%) ! 
229/239(95%) i 


e-121 


Q9WUP3 


PDCL2 - Mus musculus (Mouse), 
238 aa (fragment). 


1..239 
1..238 


207/239(86%) ! 
228/239(94%) j 


e-121 


Q9DA99 i 


1700016K07RIK PROTEIN - Mus 
musculus (Mouse), 192 aa. 


47..239 
1..192 


165/193 (85%) 1 
183/193 (94%) | 


3e-94 


CAC40345 j 


SEQUENCE 5 FROM PATENT 
WO01 34798 - Brachydanio rerio 
(Zebrafish) (Zebra danio), 239 aa. 


1..237 
3..239 


133/238(55%) 
181/238(75%) j 


le-74 


Q9H2J4 


HTPHLP (UNKNOWN) (PROTEIN 
FOR MGC:3062) - Homo sapiens 
(Human), 239 aa. 


1..231 
3..236 


137/234(58%) ! 
176/234(74%) j 


8e-74 



PFam analysis predicts that the NOV58a protein contains the domains shown in the 
Table 58F. 
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Table 58F. Domain Analysis of NO V58a 


Pfam Domain 


NOV58a Match Region \ 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


Phosducin: domain 1 of 1 


60.. 175 


32/120(27%) 
55/120 (46%) 


5.8 



Example 59. 

The NOV59 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 59A. 



Table 59 A. NOV59 Sequence Analysis 




SEQIDNO: 181 


981 bp 


NOV59a, 

CG59576-01 DNA Sequence 


GCCACCG CGCCC AGCTGGCTTTTGTTTTTTATCCTT CTG CTCCTCATTTACCTATTCA 
CCATCATTGGTAGTCTTATGGTGTTCTTTGCCATCAAACTGGATTTCTGCCTGCACAG 
CTCCTTCTATTTCTTCATCAGTGTCCTCTCCTTCCTAGAGATCTGGTATACCACCATC 
ACCAT CCC CAAGATGTTCTTCAACCT AGCCAGTG AGCAG AAGACCACCTCCCTGG ATG 
GTTGC CTATTGCAGATGTATTTCTTTT ACTCC CT CGGCATC ACTG AGGTTTGCTTGCT 
CACCACCAGGGCT ATGGACAG AT ACCTGG CCATCTGTAATCACCTTTG CTACCCCACA 
GTCACGACACCTCAGCTCTACACTCAGGTGATTCTAGGTTGTTGCATCTGTGGCTTCT 
TCACGCTGCTCCCTGAGATTGCTTGGATATCCACACTGCCATTTTGTGGTCCAAATCA 
AATCCACAACATTTTCTGTGACCTTGATCCTATCCTGAATCTAGCATGTGTAGACACT 
GGCCCAGTTGTTTTAATCAAGGTTGTGGACATTGTACATGCTGTGGAGATCATCACAG 
CTAT AATGCTTGTGACTTTGG CTTACGTC CAAATTATTG CAGTG ATCCTAAGAAACTG 
CTCTGCTGATGGATGCCAAAAGGCATTTTCTACCTATGCTTTCCACCTTGCTATTTTC 
TTAATCTTTTTTGGAAGTGTAGCCCTGATGTACCTGCTCTTCTCTGCCAAGTACTCCT 
TTTTCTGGGACACAACCATCAGCCTAATGTTTGCAGTGCTGTCACCGACAACAATCAT 
CTGTAGTCTGAGGAATAAAGAGATAAAGGAAGCAATAAAAAAGCACATGTGCCAATCA 
ATGATATGCACACATCATGTCAAATAAGACCAAATACACACCTCTTAATTACCAAAGA 




ATATTTATACAAATATTTACATTAATACGTTCAGTGTGTTTGTTGCTGCTGTG 




ORF Start: GCC at 1 


ORF Stop: TAA at 895 




SEQIDNO: 182 


298 aa 


MW at 33780.0kD 


NOV59a, 

CG59576-01 Protein Sequence 


ATAPSWLLFFILLLLIYLFTIIGSI^FFAIKLDFCLHSSFYFFISVLSFLEIWYTTI 
T I P KM F FNLAS E QKTTS LDG CLLQMYF F YS LG I TEVC LLTTRAMD RYLA I CNHLC YPT 
VTTPQLYTQVILGCCICGFFTLLPEIAWISTLPFCGPNQIHNIFCDLDPILNLACVDT 
GPWLI KWDI VHAVEI ITAIMLVTLAYVQI I AVI LRNCSADGCQKAFSTYAFHLAI F 
LI FFGS VALMYLLFS AKYS FFWDTTI SLMFAVLS PTTI I CSLRNKEI KEAI KKHMCQS 
MICTHHVK 



Further analysis of the NOV59a protein yielded the following properties shown in 
Table 59B. 



Table 59B. Protein Sequence Properties NOV59a 


PSort 
analysis: 


0.6400 probability located in plasma membrane; 0.4600 probability located in 
Golgi body; 0.3700 probability located in endoplasmic reticulum (membrane); 
0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Likely cleavage site between residues 25 and 26 
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A search of the NOV59a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 59C. 



Table 59C. Geneseq Results for NOV59a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, ; 
Date] 


NOV59a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for j 
the Matched 
Region 


Expect 
Value 


AAG72586 


Human OR-like polypeptide query 
sequence, SEQ ID NO: 2267 - Homo ; 
sapiens, 289 aa. [WO200127158-A2, 
19-APR-2001] 


7..295 
1..289 


286/289(98%) ! 
286/289(98%) 


e-167 


AAG71784 


Human olfactory receptor polypeptide, \ 
SEQ ID NO: 1465 - Homo sapiens, 
289aa.[WO200127158-A2, 19-APR- \ 
2001] 


7.. 295 
1..289 


286/289(98%) ! 
286/289(98%) 


e-167 


AAG71785 


Human olfactory receptor polypeptide, 
SEQ ID NO: 1466 - Homo sapiens, 
318 aa. [WO200127158-A2, 19-APR- 
2001] 


5..292 
20..311 


175/293 (59%) \ 
217/293 (73%) | 


6e-95 


AAU24721 


Human olfactory receptor AOLFR220 ) 
- Homo sapiens, 343 aa. 
[WO200168805-A2, 20-SEP-2001] 


7..283 
53..328 


170/279(60%) ! 
212/279(75%) i 


4e-94 


AAG71808 


Human olfactory receptor polypeptide, i 
SEQ ID NO: 1489 - Homo sapiens, 
317 aa. [WO200127158-A2, 19-APR- \ 
2001] 


7..283 
29..304 


170/279(60%) 
212/279(75%) 


4e-94 



In a BLAST search of public sequence databases, the NOV59a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 59D. 



Table 59D. Public BLASTP Results for NOV59a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV59a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96R35 


OLFACTORY RECEPTOR - Homo i 
sapiens (Human), 216 aa (fragment), j 


50..267 
1..216 


107/218(49%) 
146/218(66%) 


7e-55 


Q9EPG2 








2e-52 
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Mus musculus (Mouse), 314 aa. 


19..303 


172/289(58%) 




095007 


Olfactory receptor 6B1 (Olfactory 
receotor 7-3) COR7-3) - Homo 
sapiens (Human), 311 aa. 


10..285 
28..301 


109/279(39%) 
170/279(60%) 


le-51 


Q9QWU6 


OLFACTORY RECEPTOR 17 - 
Mus musculus (Mouse), 327 aa. 


1..289 
20..314 


111/298(37%) 
171/298 (57%) 


2e-50 


P23270 


Olfactory receptor-like protein 17 - 
Rattus norvegicus (Rat), 327 aa. 


1..289 
20..314 


111/298(37%) 
171/298 (57%) 


2e-50 



PFam analysis predicts that the NOV59a protein contains the domains shown in the 
Table 59E. 



Table 59E. Domain Analysis of NOV59a 


Pfam Domain 


NOV59a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


7tm_l : domain 1 of 1 


37..164 


30/134 (22%) 
90/134 (67%) 


5.4e-13 



Example 60. 



The NOV60 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 60A. 



Table 60A. NOV60 Sequence Analysis 




SEQIDNO: 183 


1201 bp 


NOV60a, 

CG59557-01 DNA Sequence 


AGGATAACTTTATATGTTGCAAAATGACTCACATAGTATATTTTATTTAACCAGCCTA 


ATTTCAAGGCTGTTTAGTTGCTTGAAAAGAAGGTTTTTATTTGTTCTTTGCATGTACT 


TAGAATGCrrGACTGTGTTTTATGAGCCAACAAGTGAAACCGCTGAAAATATGGATCCA 
G AG AATCAGACAATGGTG AC TGAGTTTT ATTT CTCTG ATTTTCCTCAAT CTAAG AATG 
GCAGCCTCTTATTCTTCATTCCTATGCTCTTTATTTATATATTCATTCTTGTTX5GAAA 
TTTCATGATTTTCTTTGCTGTCCGACCGGACCCCCATCTCCATAATCCTATGTACAGT 
TTTATCAGTGTCTTCTCCTTCCTGGAGATTTGGTACACCACCGTGACTATCCCCAAGA 
TGCTCTCCAACCTTCTCAGTGAACAGAAAACCATCTCTTTCATAGGTTGCCTCCTGCA 
GATGTACTTCTTCCACTCACTCGGGGTCACAGAAGCCCTAGTCCTCACAGTGATGGCC 
ATTGACAGGTGTGTAGCCATCTGCAACCCCCTTCGCTATGCAATCACTATGTCCCCTA 
GACTGTGCATCCAGCTCTCCACTGGCTCTTGCATTTTTGGCTTCCTCATGTTACTGCC 
AGAGATTGTGTGCATTTCCACTCTTCCATTCTGTGGCGCCAACCAAATTCATCAACTC 
TTTTGTG ACTTTG AACCTGTGCTG CAGTT AG C CTG C ACAG AT ACGT ACATAATTCTGG 
TTCAAGATCTGATCCGTGCTATTTCCATTCTGACCTCTGTCTCTGTCATCACCCTTTT 
CTATTTAAGAATCATCACGGTGATCCTGAGGATTCCCTCTGGTGAGAGTCGTCAGAAG 
GCTTTCTTCACATGTGCAGCCCACATTGCTATTTTCTTGCTGTTTTTTGGCAGTGTGT 
CACTCATGTATCTGCGCTTCTCTGTCACATTCCCACCATTACTGGACAAGGCCATTGC 
ACTGATGTTTGCTGTCCTTGCCCTACTTTTCAACCCAGTAATCTATAGTCTGAGGAAC 
AAAGATATGAAAAACGCCACCAAGAAAATCCTCTGTTCTCAAAAGATGTTCAATGCCT 
CTGGGAGCTAATGGAGTTCACACACACCTCTTCAAAGAAATCTCATCATCTCCTTAAG 


TTTAAAATGCTAACAAATCAGTTTTTTTAAATTACCATGCA 




ORF Start: ATG at 121 


ORF Stop: TAA at 1111 




SEQIDNO: 184 


330 aa MW at 37439.1kD 


NOV60a, 

CG59557-01 Protein Sequence 


MLTVFYEPTSETAENMDPENQTMVTEFYFSDFPQSKNGSLLFFIPMLFIYIFILVGNF 
MI FFAVRPDPHLHNPMYSFI SVFSFLE I WYTTVTI PKMLSNLLSEQKTI S FIGCIjLQM 
YFFHSLGVTEALVLTVMAIDRCVAICNPLRYAITMSPRLCIQLSTGSCIFGFLMLLPE 
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I VC I S TL P FCGANQ I HQLFCDFE P VLQLACTDT Y I I LVED V I RA I S I LTS VS V I TLF Y 
LRIITVILRIPSGESRQKAFFTCAAHIAIFLLFFGSVSLMYLRFSVTFPPLLDKAIAL 
M F AVLALL FN P VI YS LRNKDMKN ATKK I LCSQKM FNASGS 



Further analysis of the NOV60a protein yielded the following properties shown in 
Table 60B. 



Table 60B. Protein Sequence Properties NOV60a 


PSort 
analysis: 


0.6000 probability located in plasma membrane; 0.4000 probability located in 
Golgi body; 0.3000 probability located in endoplasmic reticulum (membrane); 
0.0300 probability located in mitochondrial inner membrane 


SignalP 
analysis: 


Likely cleavage site between residues 67 and 68 



A search of the NOV60a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 60C. 



Table 60C. Geneseq Results for NOV60a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV60a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for \ 
the Matched 
Region 


Expect 
Value 


AAG71807 ; 


Human olfactory receptor 
polypeptide, SEQ ID NO: 1488 - 
Homo sapiens, 319 aa. 
[WO200127158-A2, 19-APR-2001] j 


16..330 
1..315 


313/315(99%) 
314/315(99%) 


e-180 


AAG71803 


Human olfactory receptor 
polypeptide, SEQ ED NO: 1484 - 
Homo sapiens, 315 aa. 
[WO200127158-A2, 19-APR-2001] j 


16..329 
1..314 


219/314(69%) 
259/314(81%) 


e-129 


AAU24658 | 


Human olfactory receptor AOLFR156 ; 
- Homo sapiens, 33 1 aa. 
[WO200168805-A2, 20-SEP-2001] 


9..329 
10..330 


218/321 (67%) j 
259/321 (79%) 


e-128 


AAU24721 ■ 


Human olfactory receptor AOLFR220 
- Homo sapiens, 343 aa. 
[WO200168805-A2, 20-SEP-2001] 


20..329 
33..342 


196/310(63%) ! 
234/310 (75%) 


e-111 


AAG71808 j 


Human olfactory receptor 
polypeptide, SEQ ED NO: 1489 - 
Homo sapiens, 317 aa. 
[WO200127158-A2, 19-APR-2001] 


20..323 
9..312 


195/304 (64%) 
232/304 (76%) 


e-111 



272 



WO 02/072757 PCT7US02/06908 

In a BLAST search of public sequence databases, the NOV60a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 60D. 



Table 60D. Public BLASTP Results for NOV60a 


Protein 

Accession 
Number 


Protein/Organism/Length 


NOV60a 
Residues/ 
Match 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9WU86 


ODORANT RECEPTOR SI - Mus 
musculus (Mouse), 324 aa. 


15..324 
8..320 


135/315 (42%) 
188/315(58%) 


4e-67 


Q9EPG2 


M5 1 OLFACTORY RECEPTOR - 
Mus musculus (Mouse), 314 aa. 


20..325 
5..311 ; 


129/307 (42%) 
189/307 (61%) i 


4e-65 


P23270 


Olfactory receptor-like protein 17 - 
Rattus norvegicus (Rat), 327 aa. 


24..319 
10..310 


126/301 (41%) j 
182/301 (59%) ! 


8e-65 


Q9QWU6 | 


OLFACTORY RECEPTOR 17 - 
Mus musculus (Mouse), 327 aa. 


16..319 
1..310 


128/310(41%) 
184/310(59%) 


9e-64 


013036 


CHICK OLFACTORY 
RECEPTOR 7 - Gallus gallus 
(Chicken), 323 aa. 


16..319 \ 
1..305 


122/305 (40%) i 
187/305 (61%) 


le-63 



PFam analysis predicts that the NOV60a protein contains the domains shown in the 
Table 60E. 



Table 60E. Domain Analysis of NOV60a 


Pfam Domain 


NOV60a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


7tm_l: domain 1 of 1 


56..304 


45/270 (17%) 
172/270 (64%) 


2.4e-21 



Example 61. 



The NOV61 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 61 A. 



Table 61 A. NOV61 Sequence Analysis 
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SEQ ED NO: 185 


1061 bp 


NOV61a, 

CG595S5-01 DNA Sequence 


CAATCTGGTCCTAAGTGATCTTTTTCTTTTTCACAGGGAAATOGGGGAAAATCAGACA 


ATGGTCACAGAGTTCCTCCTACTGGGATTTCTCCTGGGCCCAAGGATTCAGATGCTCC 
TCTTTGGGCTCTTCTCCCTGTTCTATATCTTCACCCTGCTGGGGAACGGGGCCATCCT 
GGGGCTCATCTCACTGGACTCCAGACTCCACACCCCCATGTACTTCTTCCTCTCACAC 
CTGGCTGTCGTCGACATCGCCTACACCCGCAACACGGTGCCCCAGATGCTGGCGAACC 
TCCTGCATCCAGCCAAGCCCATCTCCTTTGCTGGCTGCATGACGCAGACCTTTCTCTG 
TTTGAGTTTTGGACACAGCGAATGTCTCCTGCTGGTGCTGATGTCCTACGATCGTTAC 
GTGGCCATCTGCCACCCTCTCCGATACTCCGTCATCATGACCTGGAGAGTCTGCATCA 
CCCTGGCCGTCACTTCCTGGACGTGTGGCTCCCTCCTGGCTCTGGCCCATGTGGTTCT 

CTGTCTGTCCTCAGGCTGGCCTGTGCTGACACCTGGCTCAACCAGGTGGTCATCTTTG 
CAGCCTGCGTGTTCTTCCTGGTGGGGCCACCCAGCCTGGTGCTTGTCTCCTACTCGCA 
CATC CTGG CGG CCAT CCTGAGGATC CAGTCTGGGG AGGG CCGCAGAAAGGCCTTCTCC 
AC CTG CTCCTC CCAC CTCTGCGTGGTGGG ACTCTTCTTTGG CAGTGCCATCATCATGT 
ACATG^CCCCCAAGTCCCGCCATCCTGAGGAGCAGCAAAAGGTCTTTTTTCTATTTTA 
CAGTTTTTTCAACCCAACACTTAACCCCCTGATTTACAGCCTGAGGAACGGAGAGGTC 
AAGGGTGCCCTGAGGAGAGCACTGGGCAAGGAAAGTCATTCCTAACTGGTGTGACATT 
TGACTCTCCCTCCTCAGTCATCTCCTGGAATCTTGGTACCAAATACCACCTAAGTTCA 


CTACTCTCTTTATATCA 




ORF Start: ATG at 41 


ORF Stop: TAA at 971 




SEQ ID NO: 186 


310 aa 


MWat34713.8kD 


NOV61a, 

CG59555-01 Protein Sequence 


MGENQTMVTEFLLLGFLLGPRIQMLLFGLFSLFYIFTLLGNGAILGLISLDSRLHTPM 
YFFLSHLAVVDIAYTRNTVPQMLANLLHPAKPISFAGCMTQTFLCLSFGHSECLLLVL 
MSYDRYVAICHPLRYSVIMTWRVCITIAVTSWTCGSIjLALAHWLUjRLPFSGPHEIN 
HFFCEILSVLRLACADTWLNQWIFAACVFFLVGPPSLVLVSYSHILAAILRIQSGEG 
RRKAFSTCSSHLCWGLFFGSAIIMYMAPKSRHPEEQQKVFFLFYSFFNPTLNPLIYS 
LRNGEVKGALRRALGKE SHS 



Further analysis of the NOV6 la protein yielded the following properties shown in 
Table 61B. 



Table 61 B, Protein Sequence Properties NOV61a 


PSort 
analysis: 


0.6400 probability located in plasma membrane; 0.4600 probability located in 
Golgi body; 0.3700 probability located in endoplasmic reticulum (membrane); 
0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Likely cleavage site between residues 43 and 44 



A search of the NOV6 la protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 61C. 
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Table 61 C. Geneseq Results for NOV61a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV61a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM29935 


Peptide #3972 encoded by probe for 
measuring placental gene expression - 
Homo sapiens, 311 aa. 
[WO200157272-A2, 09-AUG-2001] 


1..310 
2..311 


310/310(100%) 
310/310(100%) 


0.0 


AAM17409 I 


Peptide #3843 encoded by probe for 
measuring cervical gene expression - 
Homo sapiens, 311 aa. 
[WO200157278-A2, 09-AUG-2001] 


1..310 
2..311 


310/310(100%) 
310/310(100%) 


0.0 


AAG72949 j 


Human olfactory receptor data 
exploratorium sequence, SEQ ID NO: 
2631 - Homo sapiens, 314 aa. 
[WO200127158-A2, 19-APR-2001] 


1..310 
2..311 .; 


310/310(100%) 
310/310(100%) 


0.0 


AAG72187 ; 


Human olfactory receptor 
polypeptide, SEQ ID NO: 1 868 - 
Homo sapiens, 310 aa. 
[WO200127158-A2, 19-APR-2001] 


1..310 
1..310 


310/310(100%) 
310/310(100%) 


0.0 


AAU04577 \ 


Human G-protein coupled receptor 
like protein, GPCR #1 1 - Homo 
sapiens, 308 aa. [WO200153454-A2, 
26-JUL-2001] 


1..310 \ 
1..308 


288/310(92%) 
294/310(93%) 


e-165 



In a BLAST search of public sequence databases, the NOV61 a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 6 ID. 



Table 61D. Public BLASTP Results for NOV61a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV61a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96R46 


OLFACTORY RECEPTOR - Homo 
sapiens (Human), 217 aa (fragment). 


67..283 
1.217 


217/217(100%) 
217/217(100%) 


e-125 


095047 


Olfactory receptor 2A4 - Homo sapiens 
(Human), 310 aa. 


1..307 
1..307 


217/307(70%) 
250/307(80%) 


e-122 


Q9NQN0 


DJ1005H11.1 (7 

TRANSMEMBRANE RECEPTOR 


39..307 
1..269 


187/269(69%) 
216/269(79%) 


e-103 
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(OLFACTORY RECEPTOR LIKE) 
PROTEIN)) - Homo sapiens (Human), 
272 aa (fragment). 


* 






Q9Z1V2 


OLFACTORY RECEPTOR B 12 - Mus 
musculus (Mouse), 223 aa (fragment). 


63..285 
1..223 


172/223(77%) j 
190/223(85%) ; 


9e-98 


043888 


OLFACTORY RECEPTOR - Homo 
. sapiens (Human), 2 1 7 aa (fragment) . 


67-282 
1..217 


173/217(79%) 1 
188/217(85%) : 


le-97 



PFam analysis predicts that the NOV6 la protein contains the domains shown in the 
Table 61E. 



Table 61E. Domain Analysis of NOV61a 


Pfam Domain 


NOV61a Match Region 


Identities/ 
Similarities 
for the Matched Region \ 


Expect Value 


7tm_l: domain 1 of 1 


40..289 


65/269 (24%) 
188/269(70%) 


l.le-45 



Example 62. 

The NOV62 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 62 A. 



Table 62A. NOV62 Sequence Analysis 




SEQ ID NO: 187 


1201 bp 


NOV62a, 

CG59551-01 DNA Sequence 


AG TTGGTTG T AAAT AATT CTG CTT AT ATT AC CT ACAG AG TAAA CATT AT AG C ATT AT C 


ACTCCAGAATCCTTTGTTTCTATGGTTTCC^GATGTTTC CAATGT CTAGATGTTC C AG 


CTGCCCATCTCTGAGAAATCCAGCTGTGTCTCACAATGGATGCCACAGCCTGTAATGA 


ATCAGTGGATGGCTCACCCGTCTTCTATCTATTGGGCATCCCCTCTCTGCCAGAGACC 
TTCTTCCTCCCTGTGTTTTTTATTTT CCTCCTCTTCT AC CTTCTCAT CCTGATGGGT A 
ATG CC CTG ATCCTGGTGGCCGTGGTGG C AG AGC CC AG CCTCCACAAGCCCATGT ACTT 
CTTTCTGATCAATCrCTCCACCTTGGACATCCTTTTCAC 

ATGCTGTCCTT ATTCTTGCTTGGGG ACCGCTTC CT CAGCTTTTCTTCCTGCTTACTG C 

AGATGTACCTCTTCCAAAGTTTTACATGTtCAGAAGCCTTCATCCTGGTGGTCATGGC 

CTATGACCGCTATGTGGCTATCTGCCACCCACTGCACTACCCTGTCCTCATGAACCCA 

C^GACCAATGCTACCTTGGCAGCCAGTGCCTGGCTAACTGCCCTCCTCCTGCCCATCC 

CAGCAGTAGTAAGGACCTCCCAGATGGCATATAACAGCATTGCCTACATCTACCACTG 

CTTCTGTGATCATCTGGCTGTGGTC CAGG CCTCCTGCTCTGACACCACCCC CC AGACC 

CTCATGGGCTTCrGCATCGCCATGGTGGTGTCCTTCCTCCCCCrTCTCCTGGTGCTTC 

TCTCCT ATGTCCACATCCTGGCCTC AGTG CTTCGCATCAGTTCCCTAGAAGG ACGGG C 

AAAAGCCTTCTCCACCTGCAGCTCCCACCTTCTGGTCGTGGGCACCTACTACTCATCT 

ATTGCCATAGCCTACGTGGCCTACAGGGCTGACCTGCCCCTTGACTTCCATATCATGG 

GCAATGTGGTATATGCCATTCTCACACCAATTCTCAACCCCCTCATTTACACGCTGAG 

AAACAGGGATCTAAAGGCAGCCATCACCAAAATCATGTCTCAAGACCCAGGCTGTGAC 

AGGAGCATTTaACCrTTAAATGCAGCTAACTCTGCTC 

CTT AG CACAGAGAAAGG ACTC AAT AC ATG ATAATG AAAT AA 




ORF Start: ATG at 152 


ORF Stop: TGA at 1112 




SEQ ID NO: 188 


320 aa 


MW at 35502.6kD 


NOV62a, 

CG59551-01 Protein Sequence 


MDATACNESVDGSPVFYLLGIPSLPETFFLPVFFIFLLFYLLILMGNALILVAWAEP 
SLHKPMVFFLINLSTLDILFTTTTVPKMLSLFLLGDRFLSFSSCLLQMYLFQSFTCSE 
AFILVVTttYDRWAICHPLHYPVUlNPQTNATIJU^SAWLTALLLPIPAVVRTSQMAYN 
SIAYIYHCFCDHLAWQASCSDTTPQTLMGFCIAMWSFLPLLLVLLSYVHILASVLR 
ISSIiEGRAKAFSTCSSHLLWGTYYSSIAIAYVAYRADLPLDFHIMGNWYAILTPIL 
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NPLIYTLRNRDVKAAITKIMSQDPGCDRSI 



Further analysis of the NOV62a protein yielded the following properties shown in 
Table 62B. 



Table 62B. Protein Sequence Properties NOV62a 


PSort 
analysis: 


0.6000 probability located in plasma membrane; 0.4000 probability located in 
Golgi body; 0.3000 probability located in endoplasmic reticulum (membrane); 
0.3000 probability located in microbody (peroxisome) 


SignalP 
analysis: 


Likely cleavage site between residues 57 and 58 



A search of the NOV62a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 62C. 



Table 62C. Geneseq Results for NOV62a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, i 
Date] 


NOV62a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for : 
the Matched 
Region 


Expect 
Value 


AAG72119 


Human olfactory receptor polypeptide, i 
SEQ ID NO: 1 800 - Homo sapiens, 
295 aa. [WO200127158-A2, 19-APR- 
2001] 


35..290 
2..257 


213/256(83%) 
228/256(88%) i 


e-119 


AAU24639 


Human olfactory receptor AOLFR 134 
- Homo sapiens, 325 aa. 
[WO200168805-A2, 20-SEP-2001] 


16..308 
17..308 


129/293 (44%) ! 
186/293 (63%) j 


6e-67 


AAG72479 


Human OR-like polypeptide query 
sequence, SEQ ID NO: 2 1 60 - Homo 
sapiens, 324 aa. [WO2001 271 58-A2, 
19-APR-2001] 


16..308 
17..308 


129/293 (44%) \ 
186/293 (63%) ! 


6e-67 


AAG71590 


Human olfactory receptor polypeptide, 
SEQ ID NO: 1271 - Homo sapiens, 
324 aa. [WO200127158-A2, 19-APR- 
2001] 


16..308 
17..308 


129/293(44%) ! 
186/293 (63%) 


6e-67 


AAG71632 


Human olfactory receptor polypeptide, ; 
SEQ ID NO: 1 3 1 3 - Homo sapiens, 
316 aa. [WO200127158-A2, 19-APR- : 
2001] 


16..315 
13..312 


126/300(42%) i 
179/300(59%) ! 


3e-64 



277 



WO 02/072757 PCT7US02/06908 

In a BLAST search of public sequence databases, the NOV62a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 62D. 



Table 62D. Public BLASTP Results for NOV62a 


Protein 
Accession 
uiiiucr 


Protein/Organism/Length 


1 1 VJ V D.£a 

Residues/ 
Match 
Residues 


Yrlpn titipc/ 
luciiiiiica/ 

Similarities for \ 
the Matched 
Portion \ 


Expect 
Value 


Q9Z236 


OLFACTORY RECEPTOR - 
Rattus norvegicus (Rat), 221 aa 
(fragment). 


70..289 
2..221 


187/220(85%) ! 
202/220(91%) 


e-104 


CAB43131 


OLFACTORY RECEPTOR - 
Stenella coeruleoalba (Striped 
dolphin), 172 aa (fragment). 


69..240 1 
1..172 


136/172(79%) j 
148/172(85%) 


le-73 


Q9EPG2 


M51 OLFACTORY RECEPTOR - 
Mus musculus (Mouse), 314 aa. 


16.310 
12..305 


131/295 (44%) 
191/295(64%) 


2e-67 


Q9H208 


HP4 OLFACTORY RECEPTOR - 
Homo sapiens (Human), 317 aa 
(fragment). 


16..312 
12..308 


127/297(42%) j 
180/297(59%) j 


3e-65 


Q920G5 


OLFACTORY RECEPTOR P3 - 
Mus musculus (Mouse), 324 aa. 


16..308 | 
19..311 


126/295 (42%) 
180/295^60%) 1 


le-62 



PFam analysis predicts that the NOV62a protein contains the domains shown in the 
Table 62E. 



Table 62E. Domain Analysis of NOV62a 


Pfam Domain 


NOV62a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


7tm_l: domain 1 of 1 


46..295 


58/268 (22%) 
179/268 (67%) 


4.6e-38 



Example 63. 



The NOV63 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 63A. 



Table 63A. NOV63 Sequence Analysis 




SEQIDNO: 189 


1042 bp 



278 



WO 02/072757 



PCT7US02/06908 



NOV63a, 

CG59540-01 DNA Sequence 


GACCTTTCATCACACTCTGGTCATTTACAAACTGTTATTAAGGAATGGGGGACAAGCA 


GCCCTGGGTCACAGAATTCATCCTGGTTGGATTC CAGCTCAGTGCAGAGATGGAG AT C 
TTTCTCTCTTGCATCTTCTCCCTGTTATATCTCTTCAGTCTACTGAGGAATGGCATGA 
ACATGGGACTCATCTGTCTGGATCCCAGACTACACACCCCCATATACTTCTTCCTGTC 
ACACTTGGCCGTCATTGACATATACTATGCTTCCAACAATTTGCTCAACATGCTGGAA 
AACCTAGTGAAACACAAAAAAACTATCTCGTTCATCTCTTGCATTATGCAGATGGCTT 
TGTATTTGACTTTTGCTGCTGCAGTGTGCATGATTTTGGTGGTGATGTCCTATGACAG 
ATTTGTGGCGATCTGCqVTCCCCT^CATTACACTGTCATCATGAACTGGAGAGTGTGC 
ACAGTACTGGCTATTACTTCCTGGGCATGTGGATTTTCCCTGGCCCTCATAAATCTAA 
TTCTCCTTCTAAGGCTGCCCTTCTCTGGGCCCCAGGAGGTGAACCACTTCTTCGGTGA 
AATTCTGTCTGTCCTC7VAACTGGCCTGTGCAGACACCTGGATTAATGAAATTTTTGTC 
TTTGCTGGTGGTGTGTTTGTCTTAGTCGGGCCCCTTTCCTTGATGCTGATCTCCTACA 
TGCG C AT CCTCTTGG CC ATC CTG AAG ATC C AG TCAGGCG AGGG C CACAG AAAG G ACTT 
CTCTACCTGCTCCTCCCACCrCTGTGTGGTCGGGTTCTTCTTTGCCAACGCCATTGTC 
ATGTACATGGCCCCCAAGTC CCGCCATCCCGAGGAGCAG C AGAAGGTCCTTTC CCTGT 
TTTGCAGCCTTTGGAATCAGGTGCTGAACCCCCCTCTGATCTACAGCTTGAGGAATGC 
AGAGG T CAAGAGTG CC CCACAAG AGGGCC ACTGAAG AAGG AG AGG CTGATGTT ACAAT 
CTCAAAGGCACCACGAGGAGAGGGCCTGCTCOSACAAATGGGGAAGTTGGC^ 




ORF Start: ATG at 45 


ORF Stop: TGA at 960 




SEQIDNO: 190 


305 aa 


MW at 34554.8kD 


NOV63a, 

CG59540-01 Protein Sequence 


MGDKQP WVTE F I L VG FQ LS AEME IFLSCIFS LL YLFS LLRNGMNMG L I CLD PRLHTP I 
Y F FLSHLAV I D I YYASNNLLNMLENLVKH KKT I S F I S C I MQMAL YLT FAAAVCM I LW 
MSYDRFVAICHPLHYTVIMNWRVCTVIJVITSWACGFSLALINLILLLRLPFCGPQE^N 
HFFGE I LSVLKLACADTWINE I FVFAGGVFVLVGPLSLMLI SYMRI LLAI LKI QSGEG 
HRKDFSTCS SHLCWGFFFANAI VMYMAPKSRH PEEQQKVLSLFCSLWNQVLNPPLI Y 
S LRNAE VKS APQEGH 



Further analysis of the NOV63a protein yielded the following properties shown in 
Table 63B. 



Table 63B. Protein Sequence Properties NOV63a 


PSort 
analysis: 


0.6000 probability located in plasma membrane; 0.4000 probability located in 
Golgi body; 0.3000 probability located in endoplasmic reticulum (membrane); 
0.3000 probability located in microbody (peroxisome) 


SignalP 
analysis: 


Likely cleavage site between residues 43 and 44 



A search of the NOV63a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 63C. 



Table 63C. Geneseq Results for NOV63a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV63a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAU24758 j 


Human olfactory receptor AOLFR259 
- Homo sapiens, 310 aa. 
[WO200168805-A2, 20-SEP-2001] 


1..300 
1..299 


258/300(86%) 
275/300(91%) 


e-146 


AAG72952 








e-144 
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exploratorium sequence, SEQ ID NO: 
2634 - Homo sapiens, 310 aa. 
[WO200127158-A2, 19-APR-2001] 


1..299 


272/300 (90%) 




AAG72377 


Human OR-like polypeptide query 
sequence, SEQ ID NO: 2058 - Homo 
sapiens, 312 aa. [WO200127158-A2, 
19-APR-2001] 


1..300 
1..299 


255/300(85%) 
272/300 (90%) 


e-144 


AAG72169 


Human olfactory receptor polypeptide, 
SEQ ID NO: 1850 - Homo sapiens, 
312 aa. [WO200127158-A2, 19-APR- 
2001] 


1..300 
1..299 


255/300 (85%) 
272/300(90%) 


e-144 


AAG71994 


Human olfactory receptor polypeptide, 
SEQ ID NO: 1675 - Homo sapiens, 
314 aa. [WO200127158-A2, 19-APR- 
2001] 


1..300 
1..299 


225/300(75%) 
256/300 (85%) 


e-129 



In a BLAST search of public sequence databases, the NOV63 a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 63D. 



Table 63D. Public BLASTP Results for NOV63a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV63a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


095047 


Olfactory receptor 2A4 - Homo 
sapiens (Human), 310 aa. 


1..299 
1..298 


173/299 (57%) \ 
217/299 (71%) j 


2e-92 


043885 


OLFACTORY RECEPTOR - 
Homo sapiens (Human), 217 aa 
(fragment). 


67..281 
1..216 


154/216 (71%) I 
182/216(83%) j 


5e-88 


043888 


OLFACTORY RECEPTOR - 
Homo sapiens (Human), 217 aa 
(fragment). 


67..281 
1..216 


153/216(70%) \ 
182/216(83%) j 


8e-88 


Q96R48 


OLFACTORY RECEPTOR - 
Homo sapiens (Human), 217 aa 
(fragment). 


67..281 
1.216 


153/216 (70%) ! 
181/216 (82%) \ 


2e-87 


Q96R47 


OLFACTORY RECEPTOR - 
Homo sapiens (Human), 215 aa 
(fragment). 


67..281 
1..214 


149/215 (69%) ! 
175/215(81%) j 


3e-84 



PFam analysis predicts that the NOV63a protein contains the domains shown in the 
Table 63E. 
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Table 63E. Domain Analysis of NOV63a 


Pfam Domain 


NOV63a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


7tm_l : domain 1 of 1 


47..290 


55/270 (20%) 
174/270 (64%) 


9.7e-25 



Example 64, 

The NOV64 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 64A. 



Table 64A. NOV64 Sequence Analysis 




SEQIDNO: 191 


973 bp 


NOV64a, 

CG59280-01 DNA Sequence 


AGGCACTAAATGAATATCTGTTTAATTCATAAAGTAACAGAGTTTCTCTTCTCTGGAT 
TCCCACAGTTTGAAGATGGTAGCCTCCTCTTCTTCATTCCATTGTTTGTTATCTACAT 
ATTCATTGTCATTGGGAATCTTATTGTATTTTTTGCAGTCAGGGTGGATACCCGTCTC 
CACAACCCCATGTATAATTTTATC^GCATTTTCTCATTTCTGGAGATCTGGTACACAA 
CTGCCACAATTCCC AAGATGCTCTCCATCCT CATCAG CAGGCAGAGGACCATCTCCAT 
GGTTGGTTGCCTCTTGCAGATGTACTTCTTCCATTCACTGGGAAATTCAGAGGGGATT 
TTGTTGACCAC C ATGG CCATTG ATAGGT ACGTTGCCAT CTGT AACCCTCTCCGCTACC 
CAACCATCATGACCCCCGGGCTGTGTGTTCAGCTCTCTGTGGGGTCCTGCATCTTTGG 
CTTTCTTGTGTTGCTCCCAGAGATTGCATGGATTTCCACACTGCCCTTCTGTGGACCC 
AACCAAATCCACCAGATCTTCTGTGATTTTGAACCTGTGCTGCGCTTGGCCTGTACAG 
ACACGTCCATGAT T CTG ATTGAGG ATGTG AT C CATGCTG TG GC C ATTGT ATT CT CT GT 
CCTGATTATTGC CT TTTCTTATATCAGAATCAT CACTGT AATCCTGAGG ATTCCCT CT 
GTTGAAGGCCGCCAGAAGGCCTTTTCTACCTGTGCCGCCCATCTTAGTGTCTTTCTGA 
TGTTCTATGGCAGTGTATCCCTCATGTACCTGCGTTTCTCTGCCACTTTCCCACCGAT 
TTTGGACACAGCTGTTGCACTGATGTTTGCAGTTCTTGCTCCCTTTTTCAACCCTATC 
AT CT ATAGCTTT AG AAAT AAGG ACATG AAG ATTG CAATT AAAAAGCTTTTCTGCCCTC 
AGAAGATGGTTAATTTATCTGTAGATTAATGCTAGCTCATAGGCA 




ORF Start: ATG at 10 


ORF Stop: TAA at 955 




SEQIDNO: 192 


315 aa 


MW at 35741. 4kD 


NOV64a, 

CG59280-01 Protein Sequence 


MNICLIHKVTEFLFSGFPQFEDGSLLFFIPLFVIYIFIVIGNLIVFFAVRVDTRLHNP 
MYNF I S I FS FLE IWYTTAT I PKMLS I LI SRQRT I SMVGCLLQMY FFHSLGNS EG ILLT 
TMAIDRYVAICNPLRYPTIMTPGLCVQLSVGSCIFGFLVIjLPEIAWISTLPFCGPNQI 

hqifcdfepvlrlactdtsmiliedvihavaivfsvliiafsyiriitvilripsveg 
rqkafstcaahlsvflmfygsvslmylrfsatfppildtavalmfavlapffnpiiys 
frnkdmki ai kklfcpqkmvnls yd 




SEQ ID NO: 193 


929 bp 


NOV64b, 

CG59280-02 DNA Sequence 


TCTT CTTCATT CCATTGTTTGTTATCT ACATATTCATTGTCATTGGGAAT CTTATTGT 
ATTTTTTGCAGTCAGGGTGGATACCCGTCTCCACAACCCCATGTATAATTTTATCAGC 
ATTTTCTCATTTCTGGAGATCTGGTACACAACTGCCACAATTCCCAAGATGCTCTCCA 
TCCTCATCAGCAGGCAGAGGACCATCTCCATGGTTGGTTGCCTCTTGCAGATGTACTT 
CTTCCATTCACTGGGAAATTCAGAGGGGATTTTGTTGACCACCATGGCCATTGATAGG 
TACGTTGCCATCTGTAACCCTCTCCGCTACCCAACCATCATGACCCCCGGGCTCTGTG 
TTCAGCTCTCTGTGGGGTCCTGCATCTTTGGCTTTCTTGTGTTGCTCCCAGAGATTGC 
ATGGATTTCCACACTGCCCTTCTGTGGACCCAACCAAATCCACCAGATCTTCTGTGAT 
TTTGAACCTGTGCTGCGCTTGGCCTGTACAGACACGTCCATGATTCTGATTGAGGATG 
TGATCCATGCTGTGGCCATTGTATTCTCTGTCCTGATTATTGCCTTTTCTTATATCAG 
AATCATC ACTGTAATCCTGAGG ATTCCCTCTGTTG AAGGC CG CCAG AAGGCCTTTT CT 
ACCTGTGCCGCCCATCTTAGTGTCTTTCTGATGTTCTATGGCAGTGTATCCCTCATGT 
ACCTGCGTTTCTCTGCCACTTTCCCACCGATTTTGGACACAGCTGTTGCACTGATGTT 
TGCAGTTCTTG CTCCCTTTTTCAACCCTATCATCTATAG CTTT AG AAATAAGGACATG 
AAGATTGCAATTAAAAAGCTTTTCTGCCCTCAGAAGATGGTTAATTTATCTGTAGATT 
AATGCTAGCTCATAGGCACCTTTCACTGTGGATGTTACTCTAACACAATAAACCATAT 


A 




ORF Start: TTC at 3 


ORF Stop: TAA at 870 




SEQIDNO: 194 


289 aa 


MW at 32772.9kD 
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NOV64b, 

CG59280-02 Protein Sequence 



F FI PL F V I YI F I VI GNL I VFF A VRVDTRLHNPMYN F I S I FS FLE I WYTT AT I PKML S I 
LIS RQRT I SMVGC LLQMY F FHS LGNS EG I LLTTMAI DR YVAI CN PLRY PT I MTPG LCV 
QLSVGSCIFGFLVLLPEIAWISTLPFCGPNQIHQIFCDFEPVLRIACTDTSMILIEDV 
IHAVAIVFSVLIIAFSYIRIITVILRIPSVEGRQKAFSTCAAHLSVFLMFYGSVSIiMY 
LRFSATFPPI LDTAVALMFAVLAPFFNPI I YSFRNKDMKI AIKKLFCPQKMVNLSVD 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 64B. 



Table 64B. Comparison of NOV64a against NOV64b. 


Protein Sequence 


NOV64a Residues/ 
Match Residues : 


Identities/ 
Similarities for the Matched Region 


NOV64b 


27..315 
1.289 


289/289(100%) 
289/289 (100%) 



Further analysis of the NOV64a protein yielded the following properties shown in 
Table 64C. 



Table 64C Protein Sequence Properties NOV64a 


PSort 
analysis: 


0.6000 probability located in plasma membrane; 0.4000 probability located in 
Golgi body; 0.3000 probability located in endoplasmic reticulum (membrane); 
0.3000 probability located in microbody (peroxisome) 


SignalP 
analysis: 


Likely cleavage site between residues 54 and 55 



A search of the NOV64a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 64D. 



Table 64D. Geneseq Results for NOV64a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV64a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG71805 ; 


Human olfactory receptor 
polypeptide, SEQ ID NO: 1486 - 
Homo sapiens, 256 aa. 
[WO200127158-A2, 19-APR-2001] 


59..314 
1..256 


255/256 (99%) 
255/256 (99%) 


e-145 


AAG71803 j 


Human olfactory receptor 
polypeptide, SEQ ID NO: 1484 - 
Homo sapiens, 3 15 aa. 
[WO200127158-A2, 19-APR-2001] 


9..311 
9..311 


243/303 (80%) 
267/303 (87%) 


e-143 
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AAU24658 


Human olfactory receptor AOLFR156 | 
- Homo sapiens, 33 1 aa. 
[WO200168805-A2, 20-SEP-2001] 


9..311 
25..327 


240/303 (79%) 
264/303 (86%) 


e-140 


AAG71807 


Human olfactory receptor 
polypeptide, SEQ ID NO: 1 488 - 
Homo sapiens, 319 aa. 
[WO200127158-A2, 19-APR-2001] 


9..313 
9..313 


222/305 (72%) 
259/305 (84%) 


e-131 


AAU24721 


Human olfactory receptor AOLFR220 
- Homo sapiens, 343 aa. 
[WO200168805-A2, 20-SEP-2001] 


9..308 
37.-336 


209/300 (69%) 
242/300 (80%) 


e-119 



In a BLAST search of public sequence databases, the NOV64a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 64E. 



Table 64E. Public BLASTP Results for NOV64a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV64a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9EPG2 


M5 1 OLFACTORY RECEPTOR - 
Mus musculus (Mouse), 314 aa. 


1..302 
4..303 


137/303 (45%) 
194/303 (63%) 


2e-71 


Q9EPV0 


M50 OLFACTORY RECEPTOR 
(OLFACTORY RECEPTOR M50) - 
Mus musculus (Mouse), 316 aa. 


6..302 
4..301 


132/298(44%) 
191/298(63%) 


3e-71 


Q9EPG1 


M50 OLFACTORY RECEPTOR - | 
Mus musculus (Mouse), 316 aa. 


6..302 
4..301 


130/298(43%) 
190/298 (63%) 


2e-70 


Q9WU86 


ODORANT RECEPTOR S 1 - Mus 
musculus (Mouse), 324 aa. 


1..310 
12..321 


133/313 (42%) 
190/313 (60%) 


4e-69 


Q96KK4 


DJ994E9.5 (OLFACTORY 
RECEPTOR, FAMILY 10, 
SUBFAMILY C, MEMBER 1 
(HS6M1-17)) - Homo sapiens 
(Human), 306 aa. 


9..314 
2..306 


137/307 (44%) 
189/307(60%) 


9e-68 



PFam analysis predicts that the NOV64a protein contains the domains shown in the 
Table 64F. 



Table 64 F. Domain Analysis of NOV64a 


Pfam Domain 


NOV64a Match Region 


Identities/ 

Similarities Expect Value 
for the Matched Region J 
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7tm_l: domain 1 of 1 


41. .289 


51/269 (19%) 


2.2e-33 






179/269(67%) 





Example 65. 

The NOV65 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 65 A. 



Table 65A. NOV65 Sequence Analysis 




SEQIDNO: 195 


972 bp 


NOV65a, 

CG59568-01 DNA Sequence 


GCATGGTGATCCTGTCCTGGGAAAACCAAACGATGAGAGTGGAATTCGTGCTTCAAGG 

ATTCTCTTCCATCAGACAGTTAAATATTTTCCTCTTTATGATAATTTTAGTTTTCTAC 

ATCTTAACTGTTTCTGGAAACATCCTCATTGTCCTTCTAGTTTTAGTCAGACATCATC 

TCCACACCCCTATGTACTTCCTCCTGGTGAACTTGTCCTGTCTGGAGATCTGGTATAC 

CTCTAACATCATCCCCAAAATGTTGCTGATTATCATAGCTGAAGAGAAGACTATCTCT 

GTGGCTGGCTGGCTGGCACAATTCTACTTCTTCGGATCCCTGGCTGCCACGGAGTGCC 

TCTTGCTCACTGTGATGTCCTATGATCGCTACCTAGCCATCTGCCAGCCTCTTTGCTA 

CCGTGTCCTCATGACTGGCCCCCTTTGCATCAGGCTAGCTGCTGGCTCTTGGTTCTGC 

TGCTTCCTCCTTACAGCAATCACCATGGTCTTGCTATGTAGACTAACCTTCTGTGGAC 

CCTATGAAACTGATCACTTCTTTTGTGACTTCACCCCTCTGGTTCATCTCTCCTGCAT 

GGATACCTCAGTGACTGAGACCATTGCCTTTGCCACCTCTTCTGCAGTAACTCTGATC 

CCATTTCTCCTCATTGTAGCCTCCTACTCCTGCGTCCTTTCTGCTATCCTAAGAATCC 

CATCTTGCACAGGCCAGAAAAAGGCCTTCTCCACCTGCTCTTCCCACCTCACTGTGGT 

CATAGTGTTTTATGGGACACTGATTGCCACATACCTTGTGCCCTCAGCCAACTCATCC 

CAACTCTTGTGCAAAGGGTCCTCTCTGCTCrrACATCATCCTGACACCCATG 

C CATCATTT AT AG CCTG AG AAAT AG AG ACAT C CATG AAGCT CTG AAG AAGTG C TTGAG 

GAAGAAGTCAGGTGTTTGCCTTAGATAATACGAAAAGGAAAAAA 




ORF Start: ATG at 3 


ORF Stop: TAA at 954 




SEQIDNO: 196 


317 aa MW at 35713.4kD 


NOV65a, 

CG59568-01 Protein Sequence 


MVI LSWENQTMRVEFVLQGFS S IRQLNI FLFMI I LVFYI LT VSGN I L I VLLVLVRHHL 
HTPMYFLLVNLSCLE IWYTSNI I PKMLLI 1 1 AEEKTI SVAGWLAQFYFFGSLAATECL 
LLTVMS YDRYLAI CQPLCYRVLMTGPLCI RUVAGSWFCCFLLTAITMVLLCRLTFCG P 
YETDHFFCDFTPLVHLSCMDTSVTETIAFATSSAVTLI PFLLI VASYSCVLSAI LRI P 
S CTGQKKAFS TCS SHLTW I V F YGTL I AT YL VP SANS SQLLC KGS S LLYI I LT PM FN P 
I IYSLRNRDIHEALKKCLRKKSGVCLR 



Further analysis of the NOV65a protein yielded the following properties shown in 
Table 65B. 



Table 65B. Protein Sequence Properties NOV65a 


PSort 
analysis: 


0.6000 probability located in plasma membrane; 0.4000 probability located in 
Golgi body; 0.3888 probability located in mitochondrial inner membrane; 0.3030 
probability located in mitochondrial intermembrane space 


SignalP 
analysis: 


Likely cleavage site between residues 45 and 46 



A search of the NOV65a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 65C. 
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Table 6SC. Geneseq Results for NOV65a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV65a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG72527 


Human OR-like polypeptide query 
sequence, SEQ ID NO: 2208 - Homo 
sapiens, 316 aa. [WO200127158-A2, 
19-APR-2001] 


1..316 
1..316 


315/316(99%) 
315/316(99%) 1 


0.0 


AAG72231 


Human olfactory receptor polypeptide, 
SEQ ID NO: 1912 - Homo sapiens, 
316 aa. [WO200127158-A2, 19-APR- 
2001] 


1..316 
1..316 


315/316(99%) I 
315/316(99%) 


0.0 


AAG72084 


Human olfactory receptor polypeptide, 
SEQ ID NO: 1765 - Homo sapiens, 
316 aa. [WO200127158-A2, 19-APR- 
2001] 


1..316 
1..316 


315/316(99%) ! 
315/316(99%) 


0.0 


AAG72700 


Murine OR-like polypeptide query 
sequence, SEQ ID NO: 2382 - Mus 
musculus, 314 aa. [WO200127158- 
A2, 19-APR-2001] 


1..308 
3..308 


154/308(50%) j 
208/308(67%) \ 


2e-83 


AAG71814 


Human olfactory receptor polypeptide, 
SEQ ID NO: 1495 - Homo sapiens, 
317 aa. [WO200127158-A2, 19-APR- 
2001] 


8..311 
5..308 


142/304(46%) 
208/304(67%) 


7e-79 



In a BLAST search of public sequence databases, the NOV65a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 65D. 



Table 65D. Public BLASTP Results for NOV65a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV65a | 
Residues/ : 
Match 
Residues \ 


Identities/ 
Similarities for \ 
the Matched 
Portion 


Expect 
Value 


Q9GZK7 


Olfactory receptor 11A1 (Hs6Ml- 
18) - Homo sapiens (Human), 315 
aa. 


1..308 
1..306 


147/308 (47%) 
202/308 (64%) 


4e-77 


013036 


CHICK OLFACTORY 
RECEPTOR 7 - Gallus gallus 
(Chicken), 323 aa. 


7..311 : 
4..308 


139/305 (45%) i 
198/305 (64%) 


le-76 


Q9JKA6 


OLFACTORY RECEPTOR P2 - 
Mus musculus (Mouse), 315 aa. 


4..313 
1.311 


143/311 (45%) ! 
194/311(61%) 1 


le-75 
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Q9WU86 


ODORANT RECEPTOR SI - Mus 
musculus (Mouse), 324 aa. 


14..308 
21..315 i 


144/295(48%) 
189/295 (63%) 


2e-75 


Q9UGF6 


Olfactory receptor 5V1 (Hs6Ml- 
21) - Homo sapiens (Human), 321 
aa. 


7..305 
4..302 ; 


138/299 (46%) 
199/299(66%) j 


5e-75 



PFam analysis predicts that the NOV65a protein contains the domains shown in the 
Table 65E. 



Table 65E. Domain Analysis of NOV65a 


Pfam Domain 


NOV65a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


granulin: domain 1 of 1 


144..155 


7/13 (54%) 
11/13(85%) 


1.7 


Trypan glycop: domain 1 of 
1 


218..241 


6/24 (25%) 
21/24 (88%) j 


7.9 


7tm_l : domain 1 of 1 


44..293 


53/268 (20%) 
172/268 (64%) 


1.5e-31 



Example 66. 

The NOV66 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 66A. 



Table 66A. NOV66 Sequence Analysis 




SEQIDNO:197 


987 bp 


NOV66a, 

CG59224-01 DNA Sequence 


CATCTTCCTATGTGTCATGTCTCCTCTTAATGACACAAAAATGGAAGTCCTTAGATTC 


CTCCTTATCGGGATCACTGGACTGG AGAAAAGTCGC ACCTGG ATATC CATTC CTTTCT 
TATCTGTGTACCTT CTTTCTTGG ATGGGTAATTTTAC CGTCCTCTTTTTTATCAAGAC 
AGAGCAAAG CCTCCATGAACCTATGTATTATTTGCTTTCCATGCTCT CCATCTCTG AC 
CTAGGGCTGTCTCTGTCTTCCTTACCCATCACTTTGGGACTATTCCTATTTGATGTCC 
ATGAAATTCATGCAGCTCCATGCTTTGCCCAGGAATTTTTTATCCATCTGTTTACAGT 
CAGTGAAGCCTCTGTACTGTCTGTAATGGCATTTGACTGGTATGTGGCAATCCACAGT 
CCTTTGAGATACAGCACTATCTTAACTAGTCCCAGAGCCATCAAAACAGGGGTTCTTC 
TGACTTCCAAGAATGTTCTTTTGATCCTTCCACTGCCCTTTCTCTTGCAAAGGCTGAG 
ATATTGTCATCAAAACCTGCTCTCCCACTCCTATTGTCTCCACCAGGATGTCATGAAG 
CTGATGTGTTCTGACAACACAGTCAATGTTGTCTACGGACTCTGTGCAGGACTTTCTA 
CTATGCTGGACTTGGTGTTTATTACCTTCTCCTATATGATTTTAAGGGCTGTACTGGG 
AATTGCTACCCCCAGACAGCAGTTCAAGGCCCTCAACACGTGCATCTCTCACATCTGT 
GCTGTGCTTATCTTCTATGTGCCCACG CTGAGTG CTG CCATGCTCCACCAGTTTG CCA 
GGGATGTGTCTCCTATGATCCACGTCCTCATGGCTGATATTTTTCTGCTGGTGCCACC 
CCTGTTGAATCCCATCGTGTACTGTGTGAAGACCCACCAAATCCGAGAAAAGGTTGTG 
GGGAAACTTTGTCCAAAAGTAAGTTOATCAAAGGAATGAGAAAGGGAATGAATGTATA 




A 








ORF Start: ATG at 17 


ORF Stop: TGA at 953 




SEQIDNO: 198 


312 aa 


MW at 35250.7kD 


NOV66a, 

CG59224-01 Protein Sequence 


MSPLNDTKMEVLRFLLIGITGLEKSRTWI S I PFLSVYLLSWMGNFTVLFFI KTEQSLH 
EPMYYLLSMLSISDLGLSLSSLPITLGLFLFDVHEIHAAPCFAQEFFIHLFTVSEASV 
LSVMAFDWYVAIHS PLRYSTI LTS PRAI KTGVLLTSKNVLLI LPLPFLLQRLRYCHQN 
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LLSHS YCLHQD VMKLMCSDNTVNWYGLCAGLSTMLDLVF ITPS YMI LRAVLG I AT PR 
QQFKALNTCISHICAVLI FYVPTLSAAMLHQFARDVS PMIHVLMADIFLLVPPLLNPI 
VYCVKTHQ I REKWGKLC PKVS 



Further analysis of the NOV66a protein yielded the following properties shown in 
Table 66B. 



Table 66B. Protein Sequence Properties NOV66a 


PSort 
analysis: 


0.6000 probability located in plasma membrane; 0.4000 probability located in 
Golgi body; 0.3000 probability located in endoplasmic reticulum (membrane); 
0.2007 probability located in mitochondrial inner membrane 


SignalP 
analysis: 


Likely cleavage site between residues 50 and 51 



A search of the NOV66a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 66C. 



Table 66C. Geneseq Results for NOV66a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV66a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG72488 


Human OR-like polypeptide query 
sequence, SEQ ID NO: 2 1 69 - Homo i 
sapiens, 319 aa. [WO200127158-A2, 1 
19-APR-2001] 


1..312 
1..313 


309/313(98%) j 
309/313(98%) 


e-176 


AAG71557 


Human olfactory receptor polypeptide, 
SEQ ID NO: 1 238 - Homo sapiens, 
319 aa. [WO200127158-A2, 19-APR- 
2001] 


1..312 
1..313 


309/313(98%) 1 
309/313(98%) \ 


e-176 


AAU24573 


Human olfactory receptor AOLFR63 - 
Homo sapiens, 313 aa. 
[WO200168805-A2, 20-SEP-2001] 


1..310 
1..311 


186/311 (59%) j 
246/311 (78%) : 


e-109 


AAG71558 


Human olfactory receptor polypeptide, 
SEQ ID NO: 1 239 - Homo sapiens, 
313 aa. [WO200127158-A2, 19-APR- 
2001] 


1..310 
1..311 


185/311 (59%) ! 
245/311 (78%) \ 


e-108 


AAU24682 


Human olfactory receptor AOLFR1 8 1 
- Homo sapiens, 3 1 2 aa. 
[WO200168805-A2, 20-SEP-2001] 


1..307 
1..306 


188/308(61%) \ 
237/308(76%) ; 


e-106 
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In a BLAST search of public sequence databases, the NOV66a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 66D. 



Table 66D. Public BLASTP Results for NOV66a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV66a 

Match 
Residues 


Identities/ 

111C9 I til 

the Matched 
Portion 


Value 


A AT 1S109 


PROSTATF-SPFfTFTC G 
PROTEIN-COUPLED RECEPTOR 
RA1C - Mus musculus (Mouse), 320 
aa. 


11. .303 


199/293 (67%) 




088628 


Olfactory receptor 51E2 (G-protein 
coupled receptor RAlc) - Rattus 
norvegicus (Rat), 320 aa. 


14..304 
11. .303 


141/293 (48%) 
200/293 (68%) 


2e-77 


CAC38935 


SEOUENCE 9 FROM PATENT 
WO01 3 1014 - Homo sapiens 
(Human), 3 1 8 aa. 


5. .304 
6..306 


145/302 (4S%) 
206/302 (68%) 


2e-77 


CAC37756 


SEQUENCE 1 FROM PATENT 
WO01 25434 - Homo sapiens 
(Human), 317 aa. 


5..304 
5..305 


145/302 (48%) 
206/302 (68%) 


3e-77 


Q9H255 


Olfactory receptor 51E2 (Prostate 
specific G-protein coupled receptor) 
(HP RAJ) - Homo sapiens (Human), 
320 aa. 


14..304 
11..303 


139/293 (47%) 
198/293 (67%) 


2e-76 



PFam analysis predicts that the NOV66a protein contains the domains shown in the 
Table 66E. 



Table 66E. Domain Analysis of NOV66a 


Pfam Domain 


NOV66a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


7tm_l: domain 1 of 2 


43..151 


30/111 (27%) 
73/111 (66%) 


6.3e-14 


7tm_l: domain 2 of 2 


212..292 


16/92 (17%) 
52/92 (57%) 


0.052 



Example 67. 



The NOV67 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 67 A. 
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Table 67 A. NOV67 Sequence Analysis 




SEQIDNO: 199 


994 bp 


NOV67a, 

CG59222-01 DNA Sequence 


CACAATGTCTGTCTTCAATAGTTCTGCCTTATACCCTCGCTTCCTCCTAACGGGCCTC 
TCAGGCCTTG AAAG CAGATATGACTTG ATTTCCCTG CC CATCTTCTTGGTTT ATGC CA 
CCTCAATTGC CGGGAACATT AGCAT CCTCTTCATTATCAG AACTGAGTCTTCCCTCCA 
CCAACCGATGTATTACTTTCTGTCAATGCTGGCATTCACTGACCTGGGCCTATCTAAC 
ACTACCTTACCTACCATGTTCAGTGTCTTCTGGTTCCATGCCCGGGAGATCTCCTTCA 
ATGCTTGTCTGGTCCAAATGTACTTCATTCATGTTTTCTCGATTATTGAGTCAGCTGT 

GCCATCCTAACCAATGATGTAATCATTGGGATTGGGTTGGCAATTGCTGGAAGGGCCT 
TGGCTCTGGTCTTTCCAGCTTCTTTCCTCTTGAAGAGGCTTCAATATCATGATGTCAA 
TATTCTGTCCT ACCTCTTCTG C C TG CAC CAGG ACCTCATAAAGACGACTGTATCCAAC 
TGT CGAGTCAG CAG CATCTATGGCCTCATGGTGGTCATC TGTTCCATGGGACTTG ATT 
CAGTGCTTCTCCTCCTCTCCTATGTCCTCATCCTGGGCACAGTGTTGAGTATAGCCTC 
CAAGGCAGAGAGAGTGAGAGCCCTCAATACTTGCATCTCCCACATCTGTGCTGTACTC 
ACCTTCTATACACCAATGATTGGGCTATCTATGATCCATCGCTATGGACAGAATGCTT 
CCTCAATTGTCCATGTGCTGATGGCCAATGTCTACTTGCTGGTTCCACCTCTCATGAA 
C CCCGTTGT CTAC AGTGTT AAG ACCAAGCAGATTCGTG ACAGAATCTTCAATAAATTC 
AAG AAACATG AAG TG T AGATG ACAG AG ATTCTG AAA C AT AACTTTC CCT CCATT C C C C 


ATATATTT 




ORF Start: ATG at 5 


ORF Stop: TAG at 944 




SEQ ID NO: 200 


313 aa 


MW at 35044.2kD 


NOV67a, 

CG59222-01 Protein Sequence 


MSVFNSSALYPRFLLTGLSGLESRYDLISLPIFLVYATSIAGNISILFIIRTESSLHQ 
PMYYFLSMIAFTDLGLSNTTLPTMFSVFWFHARBISFNACLVQMYFIHVFSIIESAVL 
LAMAFDCFIAICEPLRYAAILTNDVIIGIGLAIAGRALALVFPASFLLKRLQYHDVNI 
LSYLFCLHQDLIKTTVSNCRVSSIYGLMWICSMGLDSVLLLLSYVLILGTVLSIASK 
AERVRALNTCISHICAVLTFYTPMIGLSMIHRYGQNASSIVHVLMANVYLLVPPLMNP 
WYSVKTKQI RDRI FNKFKKHEV 



Further analysis of the NOV67a protein yielded the following properties shown in 
Table 67B. 



Table 67B. Protein Sequence Properties NOV67a 


PSort 
analysis: 


0.6000 probability located in plasma membrane; 0.4047 probability located in 
mitochondrial inner membrane; 0.4000 probability located in Golgi body; 0.3480 
probability located in mitochondrial intermembrane space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV67a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 67C. 
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Table 67C. Geneseq Results for NOV67a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV67a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG72605 


Human OR-like polypeptide query 
sequence, SEQ ID NO: 2286 - Homo i 
sapiens, 318 aa. [WO200127158-A2, 
19-APR-2001] 


1..309 
4..313 


295/310(95%) 
298/310(95%) 


e-163 


AAG71519 


Human olfactory receptor polypeptide, j 
SEQ ID NO : 1 200 - Homo sapiens, 
318 aa. [WO200127158-A2, 19-APR- \ 
2001] 


1..309 
4..313 


295/310(95%) 
298/310(95%) 


e-163 


AAU24683 


Human olfactory receptor AOLFR1 82 ; 
- Homo sapiens, 3 1 4 aa. 
[WO200168805-A2, 20-SEP-2001] ! 


5..308 
9..312 


178/304(58%) i 
235/304(76%) 


e-102 


AAG71715 


Human olfactory receptor polypeptide, ; 
SEQ ID NO: 1396 - Homo sapiens, 
314 aa. [WO200127158-A2, 19-APR- 
2001] 


5..308 
9..312 


178/304(58%) 1 
235/304(76%) j 


e-102 


ABB44526 


Human GPCR4a polypeptide SEQ ID 
NO 1 1 - Homo sapiens, 3 15 aa. 
[WO2001 74904- A2, ll-OCT-2001] 


5..308 
6..309 


169/304(55%) ! 
227/304(74%) j 


2e-96 



In a BLAST search of public sequence databases, the NOV67a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 67D. 



Table 67D. Public BLASTP Results for NOV67a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV67a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9H344 ; 


Olfactory receptor 5 1 12 
(HOR5'betal2) - Homo sapiens 
(Human), 312 aa. 


13..308 
12..307 


154/296 (52%) 
221/296 (74%) 


2e-91 


Q9H2C8 


ODORANT RECEPTOR 
HOR3'BETAl - Homo sapiens 
(Human), 321 aa. 


2..308 
10..316 


160/307 (52%) 
216/307 (70%) 


5e-89 


Q9H343 


Olfactory receptor 5111 
(HOR5T>etall) - Homo sapiens 
(Human), 314 aa. 


5..312 
5..313 


156/309(50%) 
223/309 (71%) 


9e-89 
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AAL35109 


PROSTATE-SPECIFIC G 
PROTEIN-COUPLED RECEPTOR 
RA1C - Mus musculus (Mouse), 320 
aa. 


. 13..309 
11. .307 


148/297(49%) 
207/297 (68%) 


2e-86 


Q924X8 


OLFACTORY RECEPTOR S85 - 
Mus musculus (Mouse), 314 aa. 


2..304 
3..305 


150/303(49%) 
221/303 (72%) 


le-85 



PFam analysis predicts that the NOV67a protein contains the domains shown in the 
Table 67E. 



Table 67E. Domain Analysis of NOV67a 


Pfam Domain 


NOV67a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


7tm_l : domain 1 of 1 


42.. 138 


24/99 (24%) 
67/99 (68%) 


7.8e-14 



Example 68. 

The NOV68 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 68 A. 



Table 68A. NOV68 Sequence Analysis 




SEQ ID NO: 201 


981 bp 


NOV68a, 

CG59220-01 DNA Sequence 


GCAATGAGAAACCGCAGTGTTGTCCCTGAGTTTGTCCTCCTCGGG CTGT C AGCTGG CC 

CCCAGACCCAGACTCTGCTCTT^GTGCTGTTCGTGGTGATTTGCCTCCTGACTGTGAT 

GGGAAACOTGCTGCTGCTGGTGGTGATTAATGCTGATTCTTGCCTCCACACACCCATG 

TACTTCTTCCTGGGACAATTGTCCTTCTTGGATCTCTGCCATTCCTCTGTCACTGCAC 

CT AAG CTGTTGGAGAACCTCCTGTCTGAGAAGAAAACCATCTCAGTAG AGGG CTGCAT 

GGCTCAGGTCTTCTTTGTGTTTGCCACTGGGGGCACTGAATCCTCCCTGCTTGCTGTG 

ATGGC CT ATG ACCGCTATGTTGC CATCAGCTCTCCTTTGCT CTATGG CCAAGTGATGA 

ACAGACAGCTGTGTTCAGGGCTGGTGGGGGGCTCATGGGGCTTGG CTTTTCTGGATG C 

CCTCATCAATATCCTTGTAGCTCTCAATTTAGACTTCTGTGAGGCTCAAAATATCCAC 

CACTTCAGCTGTGAGCTGCCCTCTCTCTATCCTTTGTCTTGCTCTGATGTGTCAGCAA 

GTTTTACCACCCTGCTCTGCTCCAGCTTCCTGCATTTCTTTGGAAATTTTCTCATGAT 

ATTCTTGTCTTATATTTGCATTTTGTCCACCATCCrGAGGATCAGCT 

AGAAGCAAAGCCTTCTCCACCTGCTCCTCCCACCTCACTGCAGTGATTTTCTTTTATG 

GCTCCGGATTACTCCGCTATCTCATGCCAAATTCJVGGATCCATTCAAGAGCTGATCTT 

CTCCTTGCAGTACAGCGTCATOVCTCCCATGCTGAATCTCCTCATTTACAGCCTGAAG 

AACAGGGAGGTGAAGGCAGCTGTGAGAAGAACATTGAGAAAATATTTCTAGTGTTTCA 

ATAGACTTATGAAATCAGAATGATGAGGGAACTGGATAGAACTGCAACAAGCA 




ORF Start: ATG at 4 


ORF Stop: TAG at 919 




SEQ ID NO: 202 


305 aa MW at 33732.3kD 


NOV68a, 

CG59220-01 Protein Sequence 


MRNRSWPEFVLLGLSAGP<^QTLLFVLFWICLLTVMGNLLLLWINADSCLHTPMY 
FFLGQLSFLDLCHSSVTAPKLLENLLSEKKTISVEGCMAQVFFVFATGGTESSLLAVM 
AYDR YVA I S S P LI*YGQVMNRQ LC SG LVGG SWGLAF LD AL INI LVALNLDFC EAQN I HH 
FSCELPSLYPLSCSDVSASFTTLLCSSFLHFFGNFLMIFLSYICILSTILRISSTTGR 
SKAFSTCSSHLTAVIFFYGSGLLRYLMPNSGSIQELIFSLQYSVITPMLNLLIYSLKN 
REVKAAVRRTLRKYF 



Further analysis of the NOV68a protein yielded the following properties shown in 
Table 68B. 
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Table 68B. Protein Sequence Properties NOV68a 


PSort 
analysis: 


0.6400 probability located in plasma membrane; 0.4600 probability located in 
Golgi body; 0.3700 probability located in endoplasmic reticulum (membrane); 
0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Likely cleavage site between residues 50 and 51 



A search of the NOV68a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 68C. 



Table 68C. Geneseq Results for NOV68a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV68a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for \ 
the Matched 
Region 


Expect 
Value 


AAU24771 


Human olfactory receptor AOLFR328 
- Homo sapiens, 312 aa. 
[WO200168805-A2, 20-SEP-2001] 


3..304 
5..306 


212/302(70%) ; 
251/302(82%) ; 


e-120 


AAG98585 


Mouse olfactory receptor 7 - Mus 
musculus domesticus, 214 aa. 
[WO200146262-A2, 28-JUN-2001] 


66..279 
1..214 


144/214(67%) : 
169/214(78%) j 


le-78 


AAG72680 


Murine OR-like polypeptide query 
sequence, SEQ ID NO: 2362 - Mus 
musculus, 337 aa. [WO200127158- 
A2, 19-APR-2001] 


3..305 
20..324 


148/305(48%) ! 
201/305 (65%) \ 


3e-74 


AAG71546 


Human olfactory receptor polypeptide, 
SEQ ID NO: 1227 - Homo sapiens, 
315 aa. [WO200127158-A2, 19-APR- 
2001] 


3..301 
5..306 


143/302(47%) j 
201/302(66%) \ 


2e-73 


AAG66701 


Human GPCR1 polypeptide - Homo 
sapiens, 311 aa. [WO200160865-A2, 
23-AUG-2001] 


3..301 
5..306 


143/302(47%) j 
201/302(66%) | 


2e-73 



In a BLAST search of public sequence databases, the NOV68a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 68D. 
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Table 68D. Public BLASTP Results for NOV68a 


Protein 
Accession 
Number 


Protein /O r pan is m/Len Pth 


NOV68a 
Residues/ 
Match 
Residues ; 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9JM36 


OLFACTORY RECEPTOR - Mus 
musculus domesticus (western 
Euronean house mouseV 214 aa 
(fragment). 


66.. 279 ! 
1..214 j 


144/214 (67%) 
169/214 (78%) 


5e-78 


O90Z18 


OLFACTORY RECEPTOR - Mus 
musculus (Mouse), 312 aa. 


3..299 
5..303 , 


142/299 (Al%} 
193/299(64%) 


2e-72 


Q9EPG6 


Bl OLFACTORY RECEPTOR - 
Mus musculus (Mouse), 314 aa. 


3..299 \ 
5..303 \ 


140/299(46%) 
196/299(64%) 


2e-72 


P23266 


Olfactory receptor-like protein F5 - 
Rattus norvegicus (Rat), 313 aa. 


3..305 ! 
5..309 ; 


142/305(46%) ! 
196/305 (63%) 


9e-72 


Q9EQA3 


ODORANT RECEPTOR K30 - Mus 
musculus (Mouse), 31 1 aa. 


3..305 i 
5..310 ; 


143/306 (46%) 
202/306 (65%) 


2e-71 



PFam analysis predicts that the NOV68a protein contains the domains shown in the 
Table 68E. 



Table 68E. Domain Analysis of NOV68a 


Pfam Domain 


NOV68a Match Region j 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


7tm_l : domain 1 of 1 


39..286 


54/268 (20%) 
169/268 (63%) 


1.7e-29 



Example 69. 

The NOV69 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 69A, 



Table 69 A, NOV69 Sequence Analysis 




SEQIDNO: 203 


957 bp 


NOV69a, 

CG592 18-01 DNA Sequence 


GTCCACAATGGCCAATCAGACTGTGGTGACTGAGTTCTTCCTCCAAGGCCTGACGGAT 
ACCAAAGAGCTTCAGGTGGCTGTTTTTCTGCTCCTGCTGCTTGCCTACCTTGTGACTG 
TCTCTGGGAACCTGATCATC^TCAGCCTCACCTTC 

TATGTACTTATTTCTCCAGAATCTGTCCTGCTTAGAAATTTGGTTCCAGACAGTCATC 
GTGCCCAAGATGCTGCTCAACJVTTGCCATGGGGACCAAGACCGTTAGCTTTGCTGGG^ 
GCAT^ACCCAGGACTTTT^CTTTCCACATCTTCTGGGGGC 

CACAGCCATGGCCTATGACCAGTATATTGCCATCTGCAAGCCCCTCCACTACCCCATG 
CTC ATAAGTAGTAGAGTCTGCACACAGCTCATCCTCACCTG CTGG CT ACTAGGTTTCT 
CCTTCATCATCATGCCTGT CATCCTGACCAGTC AGCTTCCATTCTGTGAT ACC CACAT 
CAAGCATTTCTTCTGTGACTACACGCCTCTAATGGAGGTGG.TCTGCAGTGGGCCAAAG 
GTGCTGGAG ATGGTGGATTTTAC CCTGGCCTTAGT AG CACTGTTTGGCACCTTGGTAC 
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TCATCACCCTGTCCTATGTCCAGATCATCCAGACAATTGTCAGAATCCCCGCTGTCCA 
GGAGAGGAAGAAGGCTTTCTCTACCTGTTCCTCTCATGTCATTATGGTTACCATGTGT 
TATGACAGCTGCTTCTTTATGTATGTCAAGCCCTCTCCAGGAAAGTGGGTTGATGTCA 
AC AAGGG AG TGTCTC T AAT C AAT ACAATT ATTG CC CC ACTG TT AAAT C C CTTCATCTG 
TACfCTGAGGAACCAACAAGTTAAGCAGGTAATGAAAGACCTAGTCAGAAAAATGACT 
TTGTTCCAAAATAAATAAGGGCCCTAAAA 




ORF Start: ATG at 8 


ORF Stop: TAA at 944 




SEQ ID NO: 204 


312 aa MW at 35358.1kD 


NOV69a, 

CG592 18-01 Protein Sequence 


MANQTVVTEFFLQGLTDTKELQVAVFLLLLLAYLVTVSGNLI 1 I SLTLLDTRLQTSMY 
LFLQNLS CLE I WFQTVI VPKMLLNI AMGTKTVSFAGC ITQD FFF PHLLGATE F FLLTA 
MA YDQY I AI CKPLHY PML I S S RVCTQL I LTCWLLG FSFIIMPVI LT SQLP FCDTH I KH 
FFCDYTPLMEWCSG PKVLEMVDFTLALVALFGTLVLITLS YVQ 1 1 QT I VRI PAVQE R 
KKAFSTCSSHVIMVTMCYDSCFFMYVKPS PGKWVDVNKGVS LI NTI I APLLN PF I CTL 
RNQQVKQ VMKDLVRKMT LFQN K 



Further analysis of the NOV69a protein yielded the following properties shown in 
Table 69B. 



Table 69B. Protein Sequence Properties NOV69a 


PSort 
analysis: 


0.6000 probability located in plasma membrane; 0.4000 probability located in 
Golgi body; 0.3000 probability located in endoplasmic reticulum (membrane); 
0.0300 probability located in mitochondrial inner membrane 


SignalP 
analysis: 


Likely cleavage site between residues 40 and 41 



A search of the NOV69a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 69C. 



Table 69C. Geneseq Results for NOV69a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV69a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for j 
the Matched 
Region 


Expect 
Value 


AAG72538 


Human OR-like polypeptide query 
sequence, SEQ ID NO: 22 1 9 - Homo ; 
sapiens, 313 aa. [WO200127158-A2, 1 
19-APR-2001] 


1..312 
1..313 


284/317(89%) I 
293/317(91%) | 


e-157 


AAG72229 


Human olfactory receptor polypeptide, 
SEQ ID NO: 191 0 - Homo sapiens, 
313 aa. [WO200127158rA2, 19-APR- 
2001] i 


1..312 
1..313 


284/317(89%) | 
293/317(91%) 


e-157 


AAU24761 


Human olfactory receptor 

AOLFR1 12B - Homo sapiens, 309 aa. 

[WO200168805-A2, 20-SEP-2001] 


1..306 
1..306 


173/307(56%) i 
227/307 (73%) 


2e-96 
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AAU24765 


Human olfactory receptor 
AOLFR225B - Homo sapiens, 309 aa. ; 
[WO200168805-A2, 20-SEP-2001] 


1..306 
1..306 


166/307(54%) \ 
227/307(73%) | 


2e-94 


AAG66353 


GPCR partial protein sequence - 
Unidentified, 313 aa. [WO200155179- i 
A2, 02-AUG-2001] 


1..309 
1..310 


160/311(51%) i 
209/311(66%) \ 


4e-87 



In a BLAST search of public sequence databases, the NOV69a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 69D. 



Table 69D. Public BLASTP Results for NOV69a 


Protein 
Accession 
Number 


PrAfAin/OrirQiiKm/T onorh 
f ruiclU/VSrgaUlalll/ J^cUglll 


NOV69a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9Z1V0 


OLFACTORY RECEPTOR C6 - 
Mus musculus (Mouse), 313 aa. 


1..309 
1..310 


160/311 (51%) 
209/311 (66%) 


2e-86 


CAC88326 


SEQUENCE 18 FROM PATENT 
WOO 1 64879 - Homo sapiens 
(Human), 331 aa. 


8..306 
12..311 


142/301 (47%) 
200/301 (66%) 


4e-78 


CAC88328 


SEQUENCE 22 FROM PATENT : 
WO01 64879 - Homo sapiens 
(Human), 331 aa. 


8..306 
12..311 


142/301 (47%) 
198/301 (65%) 


2e-77 


CAC88327 


SEQUENCE 20 FROM PATENT j 
WO01 64879 - Homo sapiens 
(Human), 331 aa. 


8..306 
12..311 


141/301 (46%) 
198/301 (64%) 


8e-77 


070270 


OLFACTORY RECEPTOR-LIKE ] 
PROTEIN - Rattus norvegicus 
(Rat), 327 aa. 


3..308 
11. .316 


136/307(44%) 
208/307 (67%) 


4e-76 



PFam analysis predicts that the NOV69a protein contains the domains shown in the 
Table 69E. 



295 



WO 02/072757 



PCT/US02/06908 



Table 69E. Domain Analysis of NOV69a 


Pfam Domain 


NOV69a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


7tm_l : domain 1 of 1 


39..244 


47/214(22%) 
147/214 (69%) 


1.9e-25 



Example 70. 

The NOV70 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 70A. 



Table 70A. NOV70 Sequence Analysis 




SEQ ID NO: 205 


962 bp 


NOV70a, 

CG592 16-01 DNA Sequence 


CATCTTCCTATGTGTCATGTCTCCTCTTAATGACACAAAAATGGAAGTCCTTAGATTC 

CTCCTTATCGGGATCACTGGACTGGAGAAAAGTCGCACCTGGATATCCATTCCTTTCT 

TATCTGTGTACCTTCTTTCTTGGATCGGTAATTTTACCGTCCTCTTTTTTATCAAGAC 

AG AGCAAAG CCTCCATGAACCT ATGT ATT ATTTG CTTTC CATGCTCTCCATCTCTG AC 

CTAGGGCTGTCTCTGTCTTCCTTACCCATCACTTTGGGACTATTCCTATTTGATGTCC 

ATGAAATT CATG C AG CT C CATGCTTTG C C C AGG AATTTTTT AT C C ATCTG TTT ACAGT 

CAGTCAAGCCTCTGTACTGTCTGTAATGGCATTTGACTGGTATGTGGCAATCCACAGT 

CCTTTGAGATACAGCACTATCTTAACTAGTCCCAGAGCCATCAAAACAGGGGTTCTTC 

TGACTTCCAAGAATGTTCTTTTGATCCTTCCACTGCCCTTTCTCTTGCAAAGGCTGAG 

ATATTGTCATCAAAACCTGCTCTCCCACTCCTATTGTCTCCACCAGGATGTCATGAAG 

CTGATGTGTTCTGACAACACAGTCAATGTTGTCTACGGACTCTGTGCAGGACTTTCTA 

CTATG CTGG ACTTGGTG TTT AT T A C C TTCT C CT AT ATT ATGATTTT AAGGGCTGT A CT 

GGGAATTGCTACCCCCAGACAGCAGTTCAAGGCCCTCAACACGTGCATCTCTCACATC 

TGTGCTGTGCTTATCTTCTATCTCCCCACGCTGAGTCCTGCCATGCTCCACCAGTTTG 

CC^GGGATCTCTCTCCTATGATCCACGTCCTCATGGCTOATAT^ 

ACCCCTGTTGAATCCCATCGTGTACTGTGTGAAGACCCACCAAATCCGAGAAAAGGTT 

GTGGGGAAACTTTGTCCAAAAGTAAGTTGATCAA 




ORF Start: ATG at 17 


ORF Stop: TGA at 956 




SEQ ID NO: 206 


313 aa 


MW at 35363.9kD 


NOV70a, 

CG59216-01 Protein Sequence 


MSPLNDTKMEVLRFLLIGITGLEKSRTWISIPFLSVYLLSWMGNFTVLPPIKTEQSLH 
EPMYYLLSMLSISDIiGLSLSSLPITLGLFLFDVHEIHAAPCFAQEFFIHLFTVSEASV 
LSVMAFDWYVAIHSPLRYSTILTSPRAIKTGVLLTSKNVLLILPLPFLLQRLRYCHQN 
LLSHSYCLHQDVMKLMCSDNTVNVVYGLCAGLSTMLDLVFITFSYIMILRAVLGIATP 
RQQFKALNTC I SHI CAVLI FYVPTLSAAMLHQFARDVS PMI HVLMADI FLLVP PLLN P 
IVYCVKTHQIREKWGKLCPKVS 



Further analysis of the NOV70a protein yielded the following properties shown in 
Table 70B. 



Table 70B. Protein Sequence Properties NOV70a 


PSort 
analysis: 


0.6000 probability located in plasma membrane; 0.4000 probability located in 
Golgi body; 0.3000 probability located in endoplasmic reticulum (membrane); 
0.2007 probability located in mitochondrial inner membrane 


SignalP 
analysis: 


Likely cleavage site between residues 50 and 51 
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A search of the NOV70a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 70C. 



Table 70C. Geneseq Results for NOV70a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV70a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG72488 


Human OR-like polypeptide query 
sequence, SEQ ID NO: 2169 - Homo 
sapiens, 319 aa. [WO200127158-A2, 
19-APR-2001] 


1..313 
1..313 


310/313(99%) j 
310/313 (99%) 


e-178 


AAG71557 


Human olfactory receptor polypeptide, 
SEQ ID NO: 1238 - Homo sapiens, 
319 aa. [WO200127158-A2, 19-APR- 
2001] 


1..313 
1..313 


310/313(99%) j 
310/313(99%) | 


e-178 


AAU24573 


Human olfactory receptor AOLFR63 - 
Homo sapiens, 313 aa. 
[WO200168805-A2, 20-SEP-2001] 


1..311 
1..311 


186/311 (59%) j 
246/311 (78%) 


e-110 


AAG71558 


Human olfactory receptor polypeptide, 
SEQ ID NO: 1239 - Homo sapiens, 
313 aa. [WO200127158-A2, 19-APR- 
2001] 


1..311 
1..311 


185/311 (59%) j 
245/311 (78%) j 


e-109 


AAU24682 


Human olfactory receptor AOLFR181 
- Homo sapiens, 3 12 aa. 
[WO200168805-A2, 20-SEP-2001] 


1..308 
1..306 


188/308(61%) 1 
238/308(77%) \ 


e-107 



In a BLAST search of public sequence databases, the NOV70a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 70D. 



Table 70D. Public BLASTP Results for NOV70a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV70a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


CAC38935 


SEQUENCE 9 FROM PATENT 
WO01 3 101 4 - Homo sapiens 
(Human), 3 1 8 aa. 


5..305 
6..306 


145/302 (48%) 
207/302 (68%) 


5e-79 
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AAL35109 


PROSTATE-SPECIFIC G 
PROTEIN-COUPLED RECEPTOR 
RA1C - Mus musculus (Mouse), 320 


14..305 
11.. 303 


141/293 (48%) 
199/293 (67%) 


7e-79 


CAC37756 


SEQUENCE 1 FROM PATENT 
WO01 25434 - Homo sapiens 
(Human), 317 aa. 


5..305 
5..305 


145/302 (48%) 
207/302 (68%) 


7e-79 


088628 ; 


Olfactory receptor 51E2 (G-protein 
coupled receptor RAlc) - Rattus 
norvegicus (Rat), 320 aa. 


14..305 
11.. 303 


141/293 (48%) 
200/293 (68%) 


7e-79 


Q9H255 ; 


Olfactory receptor 51E2 (Prostate 
specific G-protein coupled receptor) \ 
(HP RAJ) - Homo sapiens (Human), 
320 aa. 


14..305 
11. .303 


139/293 (47%) 
198/293 (67%) 


7e-78 



PFam analysis predicts that the NOV70a protein contains the domains shown in the 
Table 70E. 



Table 70E. Domain Analysis of NOV70a 


Pfam Domain 


NOV70a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


7tm_l: domain 1 of 2 


43..151 


30/111 (27%) 
73/111 (66%) 


6.3e-14 


YCF9: domain 1 of 1 


208..262 


10/59(17%) 
31/59 (53%) 


7.5 


7tm_l : domain 2 of 2 


2 12.. 293 


18/93(19%) 
55/93 (59%) j 


0.00034 



Example 71. 

The NOV71 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 71 A. 



Table 71 A. NOV71 Sequence Analysis 




SEQ ID NO: 207 


995 bp 


NOV71a, 

CG59214-01 DNA Sequence 


GCACAATQTCTGTCTTCAATAGTTCTGCCTTATACCCTCGCTTCCTCCTAACGGGCCT 
CTCAGGCCTTGAAAGCAGATATGACTTGATTTCCCTG CCCATCTTCTTGGTTT ATGC C 
ACCTCAATTGCCGGGAACATTAGCATCCTCTTCATTATCAGAACTGAGTCTTCCCTCC 
AC CAACCG ATGT ATTACTTT CTGTCAATGCTGG CATTCACTGACCTGGGCCTATCTAA 
CACTACCTTACCTACCATGTTCAGTGTCTTCTXSGTTCCJ^TGCCCGGGAGATCTCCTTC 
AATGCTTGTCTGGT C CAAATGTACTTCATTCATGTTTTCTCGATT ATTG AGTCAGCTG 
TACTCCTGGCTATCGCCTTTGACTGCTTTATAGCJ^TCrGTGAACCCTTGCGCTATGC 
AG CCATC CTAACCAATGATGT AATCATTGGGATTGGGTTGG CAATTGCTGGAAGGGCC 
TTGGCTCTGGTCTTTCCAGCTTCTTTCCTCTTGAAGAGGCTTCAATATCATGATGTCA 
ATATTCTGTCCTACCTCTTCTGCCTGCACCAGGACCrCATAAAGACGACTGTATCCAA 
CTGTCGAGTCAGCAGCATCTATGGCCTCATGGTGGTCATCTGTTCCATGGGACTTGAT 
TCAGTGCTTCTCCTCCTCTCCTATGTCCTCATCCTGGGCACAGTGTTGAGTATAGCCT 
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CCAAGGCAG AG AGAGTGAGAGCC CT CAAT ACTTGCATCTC CCACATCTGTGCTGTACT 
CACCTTCTATACACCAATGATTGGGCTATCTATGATCCATCGCTATGGACAGAATGCT 
TCCTCAATTGTCCATGTGCTGATGGCCAATGTCTACTTGCTGGTTCCACCTCTCATGA 
ACCCCGTTGTCTACAGTCTTAAGACCAAGCAGATTCGTGACAGAATCTTCAATAAATT 
CAAGAAACATGAAGTGTAGATGACAGAGATTCTGAAACATAACTTTCCCTCCATTCCC 


CATATATTT 




ORF Start: ATG at 6 


ORF Stop: TAG at 945 




SEQ ED NO: 208 


313 aa MW at 35044.2kD 


NOV71a, 

CG592 14-01 Protein Sequence 


MSVFNSSALYPRFLLTGLSGtiESRYDLISLPIFLVYATSIAGNISILFIIRTESSLHQ 
PMYYF LS MLAFTD I/3LSNTTL PTM FS VFW FHARE I S FNACL VQMY F I HVF S 1 1 ES A VL 
LAMAFDCFIAICEPLRYAAILTNDVIIGIGLAIAGRAliALVFPASFLLKRLQYHDVNI 
LS YLFCLHQDLI KTTVSNCRVSS I YGLMWICSMGLDSVLLLLSYVLILGTVLS IASK 
AERVRAIOTCISHICAVLTFYTPMIGLSMIHRYGQNASSIVHVLMANVYLLVPPU^NP 
WYSVKTKQI RDRI FNKFKKHEV 



Further analysis of the NOV7 la protein yielded the following properties shown in 
Table 71B. 



Table 71B. Protein Sequence Properties NOV71a 


PSort 
analysis: 


0.6000 probability located in plasma membrane; 0.4047 probability located in 
mitochondrial inner membrane; 0.4000 probability located in Golgi body; 0.3480 
probability located in mitochondrial intermembrane space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV71a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 71C. 



Table 71 C. Geneseq Results for NOV71 a 


Geneseq 
Identifier 


Protein/Organism/Length 
[Patent #, Date] 


NOV71a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG72605 


Human OR-like polypeptide 
query sequence, SEQ ID NO: 
2286 - Homo sapiens, 318 aa. 
[WO200127158-A2, 19-APR- 
2001] 


1..309 
4..313 


295/310(95%) 
298/310 (95%) 


e-163 


AAG71519 


Human olfactory receptor 
polypeptide, SEQ ID NO: 1200 - 
Homo sapiens, 318 aa. 
[WO200127158-A2, 19-APR- 
2001] 


1..309 
4..313 


295/310(95%) 
298/310(95%) 


e-163 


AAU24683 


Human olfactory receptor 


5..308 
9..312 


178/304(58%) 
235/304 (76%) 


e-102 
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aa. [WO200168805-A2, 20-SEP- 
2001] 








AAG71715 


Human olfactory receptor 
polypeptide, SEQ ID NO: 1396 - 
Homo sapiens, 314 aa. 
[WO200127158-A2, 19-APR- 
2001] 


5..308 
9..312 


178/304(58%) 
235/304 (76%) 


e-102 


ABB44526 


Human GPCR4a polypeptide 
SEQ ID NO 1 1 - Homo sapiens, 
315 aa. [WO200174904-A2, 11- 
OCT-2001] 


5..308 
6..309 


169/304(55%) 
227/304 (74%) 


2e-96 



In a BLAST search of public sequence databases, the NOV71 a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 71D. 



Table 71D. Public BLASTP Results for NOV71a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV71a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9H344 


Olfactory receptor 5 1 12 
(HOR5*betal 2) - Homo sapiens 
(Human), 312 aa. 


13. .308 
12..307 


154/296 (52%) 
221/296 (74%) 


2e-91 


Q9H2C8 | 


ODORANT RECEPTOR 
HOR3'BETAl - Homo sapiens 
(Human), 321 aa. 


2..308 
10..316 


160/307 (52%) 
216/307 (70%) 


5e-89 


Q9H343 


Olfactory receptor 5111 
(HOR5"betal 1) - Homo sapiens 
(Human), 314 aa. 


5..312 
5..313 


156/309 (50%) 
223/309 (71%) 


9e-89 


AAL35109 


PROSTATE-SPECIFIC G 
PROTEIN-COUPLED RECEPTOR 
RA1C - Mus musculus (Mouse), 320 
aa. 


13..309 
11. .307 


148/297 (49%) 
207/297 (68%) 


2e-86 


Q924X8 


OLFACTORY RECEPTOR S85 - 
Mus musculus (Mouse), 314 aa. 


2..304 
3..305 


150/303(49%) 
221/303 (72%) 


le-85 



PFam analysis predicts that the NOV71a protein contains the domains shown in the 
Table 71E. 
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Table 71E. Domain Analysis of NOV71a 


Pfam Domain 


NOV71a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


7tm_l : domain 1 of 1 


42..138 


24/99 (24%) 
67/99 (68%) 


7.8e-14 



Example 72. 

The NOV72 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 72 A. 



Table 72A. NOV72 Sequence Analysis 




SEQ ID NO: 209 


1004 bp 


NOV72a, 

CG5921 1-01 DNA Sequence 


CTTCTCATCTTTTCCCTCAAATACTGGGATGTCCATTCTCAATACCTCTGAAATGGAA 
ATCTCTATTTT CT ACTTGGTTGGGATCC CAGGTTTGGAG CATGCCAATATTTGGAT CT 
CTATCCCCATATGTCTCATGTACACTGTTGCTATCCTAGGGAATTGTACCATTCTGTT 
TTTC ATAAAAACAGAGCCTT CTTTGCATG AG CCCATGTACTATTTTCTCTCCATGTTG 
GCTCTCTCTGACCTGGGACTATCCCTCTCCTCTCTCCCTACCATGTTAAGGATTTTCC 
TGTTCAATGCTCCAGGAATTTCCCCTGATGCCTGTATTGCTCAAGAGTTTTTCATCCA 
TGGATTCTCAGCTATGGAGTCATCTGTACTTCTTATAATGTCCTTTGATCGCTTTATT 
G CCATCTGCAACCCCCTGAG ATACACTT C C ATCCT CAC CAGTGCCAGAGTC ATTCAAA 
TTGGGCTTGCTTTTTCTCTCAAAAATGTTTTGTTGATCCTCCCATTTCCTTTCACTCT 
AAAACATCT AAAATATTGT AAG AAG AAC CTCCTGTCCCAATCCTACTGC CTCCATCAA 
GATGTCATGAAACTGGCCTGCACTGACAACAAGGTCAACATCATCTATGGCTTATTTG 
TGGCTCTCACAGGCATCCTAGACTTGACATTTATTTTCATGTCCTACATGTTGATACT 
GAAAGCAGTGTTGAGCAT AG CATCAAGAAAG AAAAGGCT CAAGGT CCTCAATACATGT 
GTTTCCCACATCTGTGCTGTGCTCATCTTCTATGTGCCCATTATCTCCCTAGCTGTCA 
TCTACCGGTTTGCCAAACACAGTTTCCCAATCACTAGGATCCTCATAGCTGATGCTTT 
TCTGCTGGTGCCTCCATTGATGAACCCCATTGTATACTGTGTGAAGAGCCAGCAGATA 
AGAAATCTTGTCTTAG AAAAACTGTG C CAGAAGCAAAGCTGAAGCGGATGCTT AACC A 
CATGATG CTT AACCCAAA 




ORF Start: ATG at 29 


ORF Stop: TGA at 968 




SEQ ID NO: 210 


313aa 


MWat35313.1kD 


NOV72a, 

CG592 11-01 Protein Sequence 


MSIIOTSEMEISIFYLVGIPGLEHANIWISIPICLMYTVAILGNCTILFFIKTEPSLH 
E PMYY FLS MLALSDLGLS LS S L PTMLR I FLFNAPG I S PD AC I AQE F F I HG FS AME S S V 
LLIMSFDRFIAICWPLRYTSILTSARVIQIGLAFSLKNVLLILPFPFTLKHLKYCKKN 
LLS QS YCLHQD VMKLACTDNKVN 1 1 YG L FVALTG I LDLT F I FMS YMLI L KAVLS I AS R 
KKRLKVLNTCVSH I CAVLI FYVP 1 1 SLAVI YRFAKHS FP ITRI LI ADAFLLVP PLMNP 
I VYCVKS QQ I RNLVLE KLCQKQ S 



Further analysis of the NOV72a protein yielded the following properties shown in 
Table 72B. 



Table 72B. Protein Sequence Properties NOV72a 


PSort \ 
analysis: 


0.6000 probability located in plasma membrane; 0.4000 probability located in 
Golgi body; 0.3000 probability located in endoplasmic reticulum (membrane); 
0.0300 probability located in mitochondrial inner membrane 


SignalP 
analysis: 


Likely cleavage site between residues 44 and 45 
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PCT/US02/06908 



A search of the NOV72a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 72C. 



Table 72C. Geneseq Results for NOV72a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV72a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for ; 
the Matched 
Region 


Expect 
Value 


AAG71564 


Human olfactory receptor 
polypeptide, SEQ ID NO: 1245 - 
Homo sapiens, 322 aa. 
rWO200127158-A2 19-APR-20011 


1..313 
5..317 


312/313(99%) 
312/313(99%) 


e-177 


AAU24573 ; 


Human olfactory receptor AOLFR63 - \ 
Homo sapiens, 3 13 aa. 
[WO200168805-A2, 20-SEP-2001] 


1..312 
1..312 


225/312(72%) 
272/312(87%) j 


e-131 


AAG71721 


Human olfactory receptor 
polypeptide, SEQ ID NO: 1402 - 
Homo sapiens, 316 aa. 
[WO200127158-A2, 19-APR-2001] j 


1..311 
1..311 


236/312(75%) 
267/312(84%) 


e-131 


AAU24682 ; 


Human olfactory receptor AOLFR1 81 ' 
- Homo sapiens, 312 aa. 
[WO200168805-A2, 20-SEP-2001] 


1..308 
1..306 


224/308(72%) I 
265/308(85%) \ 


e-131 


AAG71701 ! 


Human olfactory receptor 
polypeptide, SEQ ID NO: 1382 - 
Homo sapiens, 312 aa. 
[WO200127158-A2, 19-APR-2001] . 


1..308 
1..306 


224/308(72%) ! 
265/308(85%) 


e-131 



In a BLAST search of public sequence databases, the NOV72a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 72D. 



Table 72D. Public BLASTP Results for NOV72a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV72a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9H344 


Olfactory receptor 5112 
(HORS^etaH) - Homo sapiens 
(Human), 312 aa. 


12..304 ! 
10.303 


152/294 (51%) 
219/294 (73%) 


6e-90 


Q9EQQ7 








9e-89 



302 



WO 02/072757 



PCT/US02/06908 





(Mouse), 319 aa. 


1..310 


219/310(70%) 




Q9H343 


Olfactory receptor 5111 
(HORfbetal 1) - Homo sapiens 
(Human), 314 aa. 


4..313 ! 
4..314 | 


154/311 (49%) 
226/311 (72%) 


9e-89 


CAC38935 \ 


SEQUENCE 9 FROM PATENT 
WO01 3 101 4 - Homo sapiens 
(Human), 318 aa. 


5..305 \ 
6..306 1 


153/302 (50%) 
217/302(71%) 


2e-87 


CAC37756 


SEQUENCE 1 FROM PATENT 
WO01 25434 - Homo sapiens 
(Human), 317 aa. 


5..305 \ 
5..305 


153/302(50%) 
217/302(71%) 


3e-87 



PFam analysis predicts that the NOV72a protein contains the domains shown in the 
Table 72E. 



Table 72E. Domain Analysis of NOV72a 


Pfatn Domain 


NOV72a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


DUF40: domain 1 of 1 


109..134 


10/26(38%) 
20/26 (77%) 


0.38 


7tm_l: domain 1 of 2 ] 


43..144 


27/107(25%) 
71/107(66%) 


1.6e-15 


7tm_l : domain 2 of 2 


212..293 


16/93(17%) 
56/93 (60%) 


4.7 


Sina: domain 1 of 1 


300..311 


7/12 (58%) 
10/12 (83%) 


1 



Example 73. 

The NOV73 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 73 A. 



Table 73A. NOV73 Sequence Analysis 




SEQIDNO:211 


1581 bp 


NOV73a, 

CG59276-01 DNA Sequence 


CTGGTGGGTTGGCGGCTAAGGGGCGGAGACAAGAGGGGCCGCCACCATCTCCTCCAAT 


GGAAGGGAGACAGGGGCGGGCTTAATGACGGAAGGAGCATGGCGTXMAGACACCTGAA 


AAAGCGGGCCCAGGATGCTGTGATCATCCTGGGGGGAGGAGGACTTCTCTTCGCCTCC 
TACCTGATGG CCACGGGAGATGAGCGTTT CTATGCTG AACACCTG ATG CCGACTCTG C 
AGGGGCTGCTGGACCCGGAGTCAGCCCACAGACTGGCTGTTCGCTTCACCTCCCTGGG 
G CTCCTTCCACGGG CCAG ATTTCAAG ACTCTG ACATGCTGGAAGTG AG AGTTCTGGG C 
CATAAATTCCGAAATCCAGTAGGAATTGCTGCAGGATTTGACAAGCATGGGGAAGCCG 
TGGACGGACTTTATAAGATGGGCrTTGGTTT^GT^GAGATAGGAAGTGTGACTCCAAA 
ACCTCAGGAAGGAAACCCTAGACCCAGAGTCTTCCGCCTCCCTGAGGACCAAGCTGTC 
ATTAACAGGTATGGATTTAACAGTCAOSGGCTTT^ 

CCAG ACAG CAG AAG C AGG CCAAGCT C ACAGAAG ATGG ACTG C CTCTGGGGGT CAACTT 
GGGGAAGAACAAGACCTCAGTGGACGCCGCGGAGGACTACGCAGAAGGGGTGCGCGTA 
CTGGGCCCCCTGGCCGACTACCTGGTGGTGAATGTGTCCAGCCCCAACACTGCCGGGC 
TGCGGAGC CTTCAGGGAAAGGCCCAGCTGCGCCGC CTGCTGACCAAGGTGCTGCAGGA 
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WO 02/072757 



PCT/US02/06908 





GAGGGATGGCTTGCGGAGAGTGCACAGGCCGGCAGTCCTGGTGAAGATCGCTCCTGAC 
CTCACCAGCCAGGATAAGGAGGACATTGCCAGTGTGGTCAAAGAGTTGGGCATCGATG 
GGCTGATTGTTACGAACACCACCGTGAGTCGCCCTGCGGGCCTCCAGGGTGCCCTGCG 
CTCTGAAACAGGAGGGCTGAGTGGGAAGCCCCTCCGGGATTTATCAACTCAAACCATT 
CGGGAGATGTATGCACTCACCCAAGGCAAGGTTTCCCGTCGAGTTCCCATAATTGGGG 
TTGGTGGTGTGAGCAGCGGGC AGGACGCGCTGGAG AAG ATCCGGG CAGGGG C CTCCCT 
GGTGCAGCTGTACACGGCCCTCACCTTCTGGGGGCCACCCGTTGTGGGCAAAGTCAAG 
CGGG AACTGGAGG CC CTTCTGAAGG AGC AGGGCTTTGGCGG AGTCACAG ATGC CATTG 
GAGC AGAT C ATCGGAGG ATGAGG AAACGGGC AG AG AAG CGG CTG ATTGTCC AGTC CCC 
CTGCGTGGAGGCTGCTTGGCTGGGCTCGAGCCCAGCGGTGGTGGGTCAGTTGGGACCT 
GGTGG TCTGCTGGTGGTCAGTTTGGG AATTT CC AGGTACG ATTGTTTTCAGGC ACTGT 
TCTTTGACTTGGTTGCAGAAAAACAGATTTTGCAACACTTTCCAAGGACACAGTGTTA 
CCACTCCCTCACCCTGCCATGGCCTCTTGGTTCTGCTTTTAACTTCTOAGCCTCAGGG 
AGTCCATCTTGTCTG 




ORF Start: ATG at 97 


ORF Stop: TGAat 1555 




SEQIDNO:212 


486 aa 


MW at 52982.6kD 


NOV73a, 

CG59276-01 Protein Sequence 


MAWRHLKKRAQDAVI I LGGGGLLFASYLMATGDERFYAEHLMPTLQGLLDPESAHRLA 
VR FT S LG LLPRARFQD S DMLE VRVLGHKFRN P VG I AAG FDKHGE A VDGLY KMG FGFVE 
IGSVTPKPQEGNPRPRVFRLPEDQAVINRYGFNSHGLSWEHRLRARQQKQAKLTEDG 
LPLGVNI^KNKTSVDAAEDYAEGWVIjGPIADYLVVNV 

LTKVLQERDG LRRVHR P A VL VKI APDLTSQD KE D I AS WKE LG I DGL I VTNTTVS RP A 
GLQGALRSETGGLSGKPLRDLSTQTIREMYALTQGKVSRRVPIIGVGGVSSGQDALEK 
I RAG AS L VQL YT ALT F WG P P WG KVKRELEALLKEQG FGG VTD A I G ADH RRMRKRAE K 
RLIVQSPCVEAAWLGSSPAWGQLGPGGLLWSLGISRYDCFQALFFDLVAEKQILQH 
FPRTQCYHSLTLPWPLGSAFNF 



Further analysis of the NOV73a protein yielded the following properties shown in 
Table 73B. 



Table 73B. Protein Sequence Properties NOV73a 


Psort 
analysis: 


0.81 10 probability located in plasma membrane; 0.6400 probability located in 
endoplasmic reticulum (membrane); 0.3700 probability located in Golgi body; 
0.1839 probability located in microbody (peroxisome) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV73a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 73 C. 



Table 73C. Geneseq Results for NOV73a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV73a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB70780 


Tobacco dihydro-orotase protein - 
Nicotiana tabacum, 458 aa. 
[WO200118190-A2, 15-MAR-2001] 


36..398 
81. .458 


199/383 (51%) 
257/383 (66%) 


e-101 


AAG01301 


Human secreted protein, SEQ ID NO: 


1..144 
1..144 


143/144 (99%) 
144/144(99%) 


3e-79 
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[EP1033401-A2, 06-SEP-2000] 








AAG91420 


C glutamicum protein fragment SEQ 
ID NO: 5174 - Corynebacterium 
glutamicum, 371 aa. [EP 11 08790- A2, 
20-JUN-2001] 


76.396 
60..366 


131/328 (39%) 
190/328 (56%) 


6e-60 


AAB46597 


C. glutamicum dihydroorotate 
dehydrogenase protein - 
Corynebacterium glutamicum, 321 aa. 
[DE19929364-A1, 28-DEC-2000] 


76..396 
10..316 


131/328(39%) 
190/328 (56%) 


6e-60 


AAB80123 


Corynebacterium glutamicum MP 
protein sequence SEQ ID NO:980 - 
Corynebacterium glutamicum, 334 aa. 
[WO200100843-A2, 04-JAN-2001] 


76.-396 
23..329 


131/328(39%) 
190/328 (56%) 


le-59 



In a BLAST search of public sequence databases, the NOV73a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 73D. 



Table 73D. Public BLASTP Results for NOV73a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV73a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q02127 


Dihydroorotate dehydrogenase, 
mitochondrial precursor (EC 1.3.3.1) 
(Dihydroorotate oxidase) (DHOdehase) 
- Homo sapiens (Human), 396 aa 
(fragment). 


1..399 
2..396 


392/399 (98%) 
394/399 (98%) 


0.0 


PC1219 | 


dihydroorotate oxidase (EC 1.3.3.1) 
precursor - human, 397 aa. 


1..399 
3..397 


388/399 (97%) 
393/399 (98%) 


0.0 


Q63707 


Dihydroorotate dehydrogenase, 
mitochondrial precursor (EC 1.3.3.1) 
(Dihydroorotate oxidase) (DHOdehase) 
- Rattus norvegicus (Rat), 395 aa. 


1..399 
1..395 


350/399 (87%) 
369/399 (91%) 


0.0 


035435 


Dihydroorotate dehydrogenase, 
mitochondrial precursor (EC 1.3.3.1) 
(Dihydroorotate oxidase) (DHOdehase) 
- Mus musculus (Mouse), 395 aa. 


1..399 
1..395 


346/399 (86%) 
366/399 (91%) 


0.0 


Q9FZM9 j 


DIHYDROOROTATE 
DEHYDROGENASE - Oryza sativa 
(Rice), 468 aa. 


29..398 
79-468 


206/394 (52%) 
261/394 (65%) 


e-101 
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WO 02/072757 PCT/US02/06908 

PFam analysis predicts that the NOV73a protein contains the domains shown in the 
Table 73R 



Table 73E. Domain Analysis of NOV73a 


Pfam Domain 


NOV73a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


DHOdehase: domain 1 of j 
1 


77..381 


183/331 (55%) 
282/331 (85%) 


1.9e-169 



Example 74. 



The NOV74 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 74A. 



Table 74A, NOV74 Sequence Analysis 




SEQIDNO:213 


1875 bp 


NOV74a, 

CG59268-01 DNA Sequence 


ATGGCCGCAGCCTCGCCTCTGCGCGACTGCCAGGCCTGGAAGGATGCGAGGCTCCCGC 
TCTCCACC ACAAG CAACGAAG CCTGC AAGCTGTTCG ATGCCACGCTGACC CAGTATGT 
AAAATGGACCAATGACAAGAGTCTCGGTGGCATCGAGGGCTGCCTGTCAAAGCTCAAA 
GCAGCAGATCCAACCTTTGTGATGGGCCACGCCATGGCTACTGGCCTTGTGCTGATTG 
GCACTGGAAGCTCCGTGAAGCTGGACAAAGAGCTGGACCTGGCTGTCAAGACAATGGT 
GGAGATTTCAAGAACCCAGCCGCTGACAAGGCGGGAGCAGCTGCACGTGTCTGCAGTA 
GAGACATTTGCC^TGGGAACTTTCCGAAAGCCTGTGAACTATGGGAACAGATTCTCC 
AGGACCACCCGACAGACATGTTGGCCCTGAAATTTTCCCATGATGCTTATTTTTACCT 
GGGCTATCAGGAACAGATGAGAGATTCTGTTGCTCGAATTTACCCCTTCTGGACACCT 
GACATCCCCCTAAGCAGCTATGTGAAAGGCATCTACTCTTTTGGCTTGATGGAAACCA 
ACTTCTACGACCAGGCAGAAAAACTCGCCAAAGAGGCACCAACTCTTTGTCTTCAACA 
CCAGCACCC C ACAGACAACTACTGGG CAGGAAAAG CAGG CTGTGATGGGG CCAGGAGT 
GGTAACACATGGGCTCTGTGTCTGCAGCCCCAGGCTGACGCATGGTCGGTGCACACCG 
T CGCT C ACAT C CACG AG ATG AAAG CAG AG AT CAAGG ATGGGTTGG AATTCATG C AG CA 
CTCAGAGACCTTCTGGAAGGACTCTGATATGTTGGCTTGTCATAACTATTGGCACTGG 
GCTTTATATCTGATTGAGAAGGGTTTAATAAGGAGAACTTTATTCTTCCAGGGCGAAT 
ATGAGGC CGCG CTGACCATCTACGAT ACCCACATCCTT CCC AGCCTGCAGGCC AACGA 
TGCAATG CTGG ACGTGGTGGACAGCTGCTCCATG CTCT ACCGCCTG CAGATGGAAGGA 
GTGTCTGTGGG CCAG CGGTGG CAGGATGTCCTGCCTGTGGCCCGGAAGCAC AG CCGAG 
ACCACATCCTGCTGTTCAATGACGCACACTTCCTGATGGCATCCCTGGGTGCACACGA 
CCCCCAGACCACACAGGAGCTGCTGACCACCCTGCGGGACGCCAGCGAGTATGCAGAG 
GGGCCTTCTCGGGGTGGGGGTCCTCACCCTGCCGAGAGGTGCCAGGCCTTTGCCTGTA 
TT ATC AGCAATCCTGACGGTT CTGTT AG ATTGGCACTGTTATG CCTGCTTACAG ATGA 
GCAAACTGAGGCTGGAAGATCCCCAGGGGAGAACTG CCAGCAC CTCCTGG C CCGAGAC 
GTGGGGCTGCCCCTGTGCCAGGCCCTGGTGGAGGCTGAGGACGGGAACCCTGACCGCG 
TCCTGGAGCTGCTCCTGCCCATCCGCTACCGGATCGTCCAGCTCGGTGGGAGCAATGC 
CCAGAGAGACGTCTTCAACCAGCTGCTGATTCACGCGGCCTTAAACTGCACCTCCAGC 
GTCCATAAGAACGTAGCCCGGAGCCTTCTGATGGAGCGTGATGCCTTGAAGCCCAACT 
CGCCCCTGACCGAGCGGCTCATCCGCAAGGCAGCTACCGTCCACCTCATGCAGAAGCC 
TTCTACCCGCCAACCCCCACTGCAGGCTGCTCTCTCCATGGAAGGAGGCGGCGGCCGC 
GATGAGCCTTCAGCCTGCCGGGCAGGGGACGTGAACATGGATGACCCTAAGAAGGAAG 
GCAAGTCCTTGCTGCTGCGGCGCTGTTGTTGTTCAGGATGTTCAGTAGAGATGGAGGG 
TGATTTAATGTTTCCCTGA 




ORF Start: ATG at 1 


ORF Stop: TGA at 1873 




SEQIDNO:214 


624 aa 


MW at 69393.3kD 


NOV74a, 

CG59268-01 Protein Sequence 


MAAASPLRDCQAWKDARLPLSTTSNEACKLFDATLTQYVKWTNDKSLGGIEGCLSKLK 
AADPTFVMGHAMATGLVLIGTGSSVKIJ3KEU3I^VKTMVEISRTQPLTRREQLHVSAV 
ETFANGNF P KACE LWEQ I LQDH PTDMLAUCFSHD A Y F YLG YQEQMRD S VARI YPFWT P 
DIPLSSWKGIYSFGI^ETNFYDQAEKIAKEAPTLCLQHQHPTDNYWAGKAGCDGARS 
GNTWALCLQPQADAWS VHTVAH I HEMKAEI KDGLE FMQHSETFWKDSDMLACHNYWHW 
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WO 02/072757 



PCT/US02/06908 



ALYLIEKGLIRRTLFFQGEYBAALTIYDTHIIjPSLQANDAMLDWDSCSMLYRLQMEG 
VSVGQRWQDVLPVARKHSRDHILLFNDAHFLMASLGAHDPQTTQELLTTLRDASEYAE 
GPSRGGGPHPAERCQAFACI I SNPDGSVRLALLCLLTDEQTEAGRSPGENCQHLIARD 
VGLPLCQALVEAEDGNPDRVLELLLPIRYRIVQLGGSNAQRDVFNQIiLIHAALNCTSS 
VHKNVARSLLMERDALKPNSPLTERLIRKAATVHLMQKPSTRQPPLQAAItSMEGGGGR 
DEPSACRAGDVNMDDPKKEGKSLLLRRCCCSGCSVEMEGDLMFP 



Further analysis of the NOV74a protein yielded the following properties shown in 
Table 74B. 



Table 74B. Protein Sequence Properties NOV74a 


PSort 
analysis: 


0.4328 probability located in mitochondrial matrix space; 0.3000 probability 
located in microbody (peroxisome); 0.1 137 probability located in mitochondrial 
inner membrane; 0.1 137 probability located in mitochondrial intermembrane 
space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV74a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 74C. 



Table 74C. Geneseq Results for NOV74a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV74a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM41338 


Human polypeptide SEQ ID NO 
6269 - Homo sapiens, 478 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..559 
10.. 478 


463/559 (82%) 
466/559 (82%) 


0.0 


AAM39552 


Human polypeptide SEQ ID NO 
2697 - Homo sapiens, 453 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..529 
1..439 


434/529 (82%) 
437/529 (82%) 


0.0 


AAG02871 


Human secreted protein, SEQ ID 
NO: 6952 - Homo sapiens, 104 aa. 
[EP1033401-A2, 06-SEP-2000] 


1..102 
1.102 


102/102 (100%) 
102/102 (100%) 


le-52 


AAM40893 


Human polypeptide SEQ ID NO 
5824 - Homo sapiens, 746 aa. 
[WO200153312-A1, 26-JUL-2001] 


568..604 
1..37 


32/37 (86%) 
32/37 (86%) 


2e-10 


AAM40892 


Human polypeptide SEQ ID NO 
5823 - Homo sapiens, 746 aa. 
[WO200153312-A1, 26-JUL-2001] 


568..604 
1..37 


32/37 (86%) 
32/37 (86%) 


2e-10 
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WO 02/072757 PCI7US02/06908 

In a BLAST search of public sequence databases, the NOV74a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 74D. 



Table 74D. Public BLASTP Results for NOV74a 


Protein 
Number 


Pr ft tpi n /O r vst n i s m/T ,f> n oth 


NOV74a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


AAH18918 j 


HYPOTHETICAL 45.7 KDA 
PROTEIN - Homo sapiens 
(Human), 404 aa. 


66..559 
1..404 


399/494 (80%) 
402/494 (80%) 


0.0 


Q9NWP8 


KAIA2372 PROTEIN - Homo 
sapiens (Human), 336 aa. 


1..352 
1..310 


305/352 (86%) 
308/352 (86%) 


e-172 


Q9XW02 \ 


Y54G1 1A.4 PROTEIN - 
Caenorhabditis elegans, 497 aa. 


4..556 
6..458 


165/557 (29%) 
256/557 (45%) 


3e-61 


Q9XW01 | 


Y54G11A.7 PROTEIN - 
Caenorhabditis elegans, 407 aa. 


4..347 
6..305 


122/347 (35%) 
177/347 (50%) 


7e-53 


Q98CS1 ! 


MLR5032 PROTEIN - Rhizobium 
loti (Mesorhizobium loti), 440 aa. 


60..553 
46..435 


145/496(29%) 
215/496(43%) 


le-43 



PFam analysis predicts that the NOV74a protein contains the domains shown in the 
Table 74E. 



Table 74E. Domain Analysis of NOV74a 


Pfam Domain 


NOV74a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


Monooxygenase: domain 1 of 
1 


225..410 


28/238 (12%) 
121/238 (51%) 


6.4 



Example 75. 

The NOV75 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 75 A. 



Table 75A. NOV75 Sequence Analysis 




SEQIDNO:215 1851 bp 


NOV75a, 

CG59549-01 DNA Sequence 


CAG CTACAG CAAAC ATCGTTCGAGATGTCCCACCAAGAGGGCAG CACAGGTGG CTT AC 
CAG ACTT AG TGACTG AAAG CCTG TT CAG CAG CCC AG AGG AG CAGTCTGGAG TAG CAG C 
GGTGACGGCGGCCTCCTCAGACATTGAAATGGCAGCCACAGAGCCATCGACCGGAGAT 
GGTGGTXSATACCAGGGATGGTGGTTTCCTGAACGATGCCAGCACAGAAAATCAAAACA 
CAGACTCAGAAAGTTCAAGTGAAGACGTCGAAC^TGAAAGCATG^TGAAGGTTTATT 
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TGGTTACCCGTTAGTGGGAGAGGAGACAGAAAGGGAGGAGGAAGAAGAAGAGATGGAG 
GAGGAAGGGGAGGAGGAAGAACAGCCTCGGATGTGTCCACGATGCGGTCGCACCAACC 
ATGATCAGTGTTTGTTAGACGAGGATCAGGOGTTGGAGGAGTGGATTTCCTCAGAGAC 
ATCTGCCCTGCCCCGATCTCGCTGGCAAGTCCTTACTGCTCTTCGCCAGCGGCAGCTG 
GGTTCAAGTGC CCGCTTTGTATATG AGGCCTGTGGGG CAAG AACCTTTGTGCAGCGTT 
TCCGCCTGCAGTATCTTCTTGGAAGCCATGCCGGTTCTGTdAGTACCATACACTTTAA 
CCAGCGTGGCACCCGACTGGCCAGTAGCGGTGATGACTTAAGGGTGATAGTGTGGGAC 
TGGGTGCGGCAGAAGCCAGTACTGAACTTTGAGAGTGGTCACGATATTAATGTCATCC 
AGGCTAAGTTCTTTCCTAACTGTGGTGATTCCACTCTGGCCATGTGTGGCCATGATGG 
ACAGGTACGGGTAGCAGAACTAATTAATGCATCATATTGCGAGAATACTAAGCGTGTG 
GCCAAG CACAGGGGACCTGCCCACG AGTTGGCT CTGGAG CCAGACTCTCCTT AT AAGT 
TCCTCACTTCAGGTGAAGATGCCGTTGTGTTCACCATTGACCTCAGGCAAGACCGGCC 
AGCITCAAAAGTTGTGGTAACAAGAGAAAATGATAAGAAAGTCGGACTGTATACAATC 
TCTATGAATCCTGCCAATATTTACCAATTTGCAGTGGGTGGACATGATCAGTTTGTAA 
GGATTTATCACCAGAGGAGAATTGATAAGAAAGAAAACAATGGAGTACTCAAGAAATT 
CACTCCTCATCATCTGGTTTATTGTGATTTCCCAACAAACATCACCTGCGTTGTGTAC 
AGCCACGATGGOVCAGAGCTCCTGGCCAGCTACAATGATGAAGATATTTACCTCTTCA 
ACT C CT CTCTC AGTG ATG GTG CTC AAT ATGTT AAG AG AT AT AAGGGGC AC AG AAAT AA 
TGACACAATCAAATGTGTTAATTTCTATGGCCCCCGGAGTCAGTTTGTCGTGAGCGGT 
AGTGATTGTGGGCACGTCTTCTTCTGGGAGAAATCATCCTCCCAGATCATCCAGTTCA 
TGGAGGGGGACAGAGGAGATAT AGTAAACTGTCTTG AACCCCAC C CTTACCTACCTGT 
GTTGG CGACCAGTGGCCTAG ATCAGCATGTCAGGATCTGGACAC CCACAGCTAAAACT 
G C CACTG AG CTT AC TGGG TTAAAAG ATG TG ATT AAG AAG AA CAAG C AGG AGCG AG AT G 
AAGACAACITGAACTATACGGACTCGTTTGACAACCGCATCCTTCGGTTCTTCGTGCG 
TCACCTGTTACAGAGAG CTCATCAACCCGGCTGG AGAGATC ATGGAG CTGAGTTCCCA 
GATGAAGAAG AGTTGG ATG AGTCTT CCAGCACCT CAG AT ACATCCGAGG AGGAGGGC C 
AAG ATCGAGTG CAGTG CAT ACCATC CTGAAGGC CTCAT ATCCAGTCCAGCT AG 




ORF Start: ATG at 25 


ORF Stop:TGA at 1825 




SEQDDNO:216 


600 aa MW at 67372.4kD 


NOV75a, 

CG59549-01 Protein Sequence 


MSHQEG STGGL PDLVTE S LFS S P E EQ SG VAAVT AASS D I EMAATE P STGDGGDTRDGG 
FLNDASTENQNTDSESSSEDVELESMGEGLFGYPLVGEETEREEEEEEMEEEGEEEEQ 
PRMC PRCGGTNHDQCLLD EDQ ALEE W I S S ETS AL PRS RWQ VLT ALRQRQI/3 S S AR FVY 
EACG ARTF VQRFRLQ YLLG S HAGS VS T I H FNQRGTRLAS S GDD LR V I VWDWVRQKP VL 
NFESGHDINVIQAKFFPNCGDSTIiAMCGHDGQVRVAELINASYCENTKRVAKHRGPAH 
E LALE PDS P YKFLT S GED A WFT I DLRQDR PAS KWVTREND KKVGLYT I S MN PAN I Y 
QFAVGGHDQFVRIYDQRRIDKKENNGVLKKFTPHHLVYCDFPTNITCWYSHDGTELL 
AS YNDEDI YLFNSS LSDGAQYVKRYKGHRNNDTI KCVKFYG PRS EFWSGSDCGHVFF 
WEKSS SQI I QFMEGDRGD I VNCLEPHPYL PVLATSGLDQHVRI WT PTAKTATELTGLK 
D V I KKNKQE RDEDNLNYTD S FDNRMLRF FVRH LXjQRAHQPGWRDHGAEF PD E EE LDE S 
SSTSDTSEEEGQDRVQCI PS 



Further analysis of the NOV75a protein yielded the following properties shown in 
Table 75B. 



Table 75B. Protein Sequence Properties NOV75a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.1000 probability located in 
mitochondrial matrix space; 0.1000 probability located in lysosome (lumen); 
0.0442 probability located in microbody (peroxisome) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV75a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 75C. 



Table 75C. Geneseq Results for NOV75a 
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Identifier 


#, Date] 


Residues/ 

Match 
Residues 


Similarities for 
the Matched 
Region 


Value 


AAR85870 


WD-40 domain-contg. Mus musculus 
protein - Mus musculus, 816 aa. 
[W09521252-A2, 10-AUG-1995] 


95..589 
333..815 


295/495 (59%) 
372/495 (74%) 


e-179 


AAM73935 


Human bone marrow expressed probe 
encoded protein SEQ ED NO: 34241 - 
Homo sapiens, 1 64 aa. 
[WO200157276-A2, 09-AUG-2001] 


1.-157 
8.. 164 


157/157(100%) 
157/157(100%) 


2e-87 


AAM61216 


Human brain expressed single exon 
probe encoded protein SEQ ID NO: 
33321 - Homo sapiens, 164 aa. 
[WO200157275-A2, 09-AUG-2001] 


1..157 
8..164 


157/157(100%) 
157/157(100%) 


2e-87 


AAM34114 ; 


Peptide #8151 encoded by probe for 
measuring placental gene expression • 
Homo sapiens, 164 aa. 
[WO200157272-A2, 09-AUG-2001] 


1.157 
8.. 164 


157/157 (100%) 
157/157(100%) 


2e-87 


AAB57007 


Human prostate cancer antigen 
protein sequence SEQ ID NO: 1585 - 
Homo sapiens, 214 aa. 
[WO200055174-A1, 21-SEP-2000] 


408.. 600 
22..214 


144/194 (74%) 
162/194 (83%) 


2e-80 



In a BLAST search of public sequence databases, the NOV75a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 75D. 
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Table 75D. Public BLASTP Results for NOV75a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV75a 
Residues/ 

Residues 


Identities/ 
Similarities for 

flip IVfatrhpH 

111 V ITJlAlVUvtl 

Portion 


Expect 

VT^O I II A 
T dlUC 




(Human), 597 aa. 


1 600 
1..597 


471/604(77%) 


0 0 


Q01078 


PROTEIN PC326 - Mus musculus 

/TV^nncfA 747 vk a 

^IVlUUoC^, /*+/ <x<x. \ 


95..589 
2fid 74fi 


295/495(59%) 
377/4QS (ld°A\ 

J 1 L/t7J 1 / *T /0 ) 


e-178 


Q9W091 


CG8001 PROTEIN - Drosophila 
melanogaster (Fruit fly), 748 aa. j 


68..587 
209..711 | 


178/533(33%) 
280/533(52%) 


le-77 


Q96E00 


UNKNOWN (PROTEIN FOR j 
MGC:9478) - Homo sapiens 
(Human), 273 aa. 


1..246 
1..243 


141/249(56%) 
173/249(68%) 


8e-66 


Q9M1E5 


HYPOTHETICAL 54.0 KDA 
PROTEIN - Arabidopsis thaliana 
(Mouse-ear cress), 481 aa. j 


183..536 
42..419 


136/382 (35%) 
209/382 (54%) 


2e-62 



PFam analysis predicts that the NOV75a protein contains the domains shown in the 
Table 75E. 



Table 75E. Domain Analysis of NOV75a 


Pfam Domain 


NOV75a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


WD40: domain 1 of 7 


188..224 


13/37 (35%) 
29/37 (78%) 


0.0016 


WD40: domain 2 of 7 


231. .269 


12/39 (31%) 
26/39 (67%) 


11 


WD40:domain3of7 


278..3 15 


9/38 (24%) 
24/38 (63%) 


2.2e+02 


WD40:domain4of7 


326..363 


8/38 (21%) 
27/38(71%) 


8.8 


WD40:domain5of7 


382..418 


5/37 (14%) 
27/37 (73%) 


12 


WD40:domain6of7 


429..466 


6/38 (16%) 
26/38 (68%) 


18 


WD40:domain7of7 


473..509 


1 1/37 (30%) 
22/37 (59%) 


0.51 
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Example 76. 

The NOV76 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 76A. 



Table 76A. NOV76 Sequence Analysis 




SEQ ID NO: 217 


7497 bp 


NOV76a, 

CG59641-01 DNA Sequence 


ATGGTCTTGCTTCTTTGTCTATCTTGTCTGATTTTCTCCTGTCTGACCTTTTCCTGGT 
TAAAAATCTGGGGGAAAATGACGGACTCCAAGCCGATCACCAAGAGTAAATCAGAAGC 
AAACCTCATCCCGAGCCAGGAGCC CTTTCCAG CCTCTGATAACTCAGGGGAGACACCG 
CAGAG AAATGGGGAGGGCCAC ACT CTGCCCAAG ACACCCAGCCAGGCCG AGCCAG CCT 
CCCACAAAGGCCCCAAAGATGCCGGTCGGCGGAGAAACTCCCTACCACCCTCCCACCA 
GAAGCCCCCAAGAAACCCCCTTTCTTCCAGTGACGCAGCACCCTCCCCAGAGCTTCAA 
GCCAACGGGACTGGGACACAAGGTCTGGAGGCCACAGATACCAATGGCCTGTCCTCCT 
CAGCCAGGCC CCAGGGCCAGCAAGCTGGCTCC CCCTC CAAAGAAGACAAGAAGCAGG C 
AAACATCAAGAGGCAGCTGATGACCAACTTCATCCTGGGCTCTTTTGATGACTACTCC 
TCCGACGAGGACTCTGTTGCTGGCTCATCTCGTGAGTCTACCCGGAAGGGCAGCCGGG 
CCAGCTTGGGGGCCCTGTCCCTGGAGGCTTATCTGACCACAAGGCCGAGCATGTCGGG 
ACTCCACCTGGTGAAGAGGGGACGGGAACACAAGAAGCTGGACCTGCACAGAGACTTT 
ACCGTGGCTTCTCCCGCTGAGTTTGTCACACGCTTTGGGGGGGATCGGGTCATCGAGA 
AGGTGCTTATTGCCAACAACGGGATTGCCGCCGTGAAGTGCATGCGCTCCATCCGCAG 
GTGGGCCTATGAGATGTTCCGCAACGAGCGGGCCATCCGGTTTGTTGTGATGGTGACC 
CCCGAGGACCTTAAGGCCAACGCAGAGTACATCAAGATGGCGGATCATTACGTCCCCG 
TCCCAGGAGGGCCCAATAACAACAACTATGCCAACGTGGAGCTGATTGTGGACATTGC 
CAAGAGAATCCCCGTGCAGGCGGT<3TGGGCTGGCTGGGGCCATGCTTCAGAAAACCCT 
AAACTTCCGGAGCTGCTGTGCAAGAATGGAGTTGCTTTCTTAGGCCCTCCCAGTGAGG 
CCATGTGGG C CTT AGG AGATAAG ATCG CCTCCACCGTTGTCGCCCAG ACG CTACAGGT 
CCCAACCCTG CCCTGG AGTGGAAGCGG CCTG ACAGTGGAGTGGACAGAAGATGATCTG 
CAGCAGGGAAAAAG AATCAGTGT C CCAGAAG ATGTTTATGACAAGGGTTGCGTGAAAG 
ACGTAGATCAGGGCTTGGAGGCAGCAGAAAGAATTGGTTTTCCATTGATGATCAAAGC 
TTCTG AAGGTGGCGG AGGG AAGGGAAT CCGG AAGGCTGAGAGTGCGG AGGACTTC CCG 
ATCCrTTTCAGACAAGTACAGAGTGAGATCCCAGGCTCGCCCATCTTTCTCATGAAGC 
TGGCCCAGCACGCCCGTCACCTGGAAGTTCAGATCCTCGCTGACCAGTATGGGAATGC 

GAAG CACCGGCCACCATCG CCCCGCTGGCCAT ATTCG AGTTCATGGAG CAGTGTGCC A 
TCCGCCTGGCCAAGACCGTGGGCTATGTGAGTGCAGGGACAGTGGAATACCTCTATAG 
TCAGG ATGGCAGCTTCCACTTCTTGGAG CTG AATCCTCG CTTG CAGGTGGAACATCCC 
TG C AC AGAAATGATTGCTG ATGTTAATCTGCCGGCCG CCCAGCTACAG AT CGCCATGG 
GCGTGCCACTGCACCGGCTGAAGGATATCCGGCTTCTGTATGGAGAGTCACCATGGGG 
AGTGACTCCCATTTCTTTTGAAACCCCCTCAAACCCTCCCCTCGCCCGAGGCCACGTC 
ATTGCCGCCAGAATCACCAGCGAAAACCCAGACGAGGGTTTTAAGCCGAGCTCCGGGA 
CTCTCCAGGAACTGAATTTCCGGAGCAGCAAGAACGTGTGGGGTTACTTCAGCGTGGC 
CGCTACTGGAGGCCTGCACGAGTTTGCGGATTCCCAATTTGGGCACTGCTTCTCCTGG 
GGAGAGAACCGGGAAGAGGCCATTTCGAACATGGTGGTGGCTTTGAAGGAACTGTCCA 
TCCGAGGCGACTTTAGGACTACCGTGGAATACCTCATTAACCTCCTGGAGACCGAGAG 
CTTCC^GAACAAOSAC^TCGAC^CCXJGGTGGTTGGACTACCTCATTGCTGAGAAAGTG 
CAGGCGGAGAAACCGGATATCATGCTTGGGGTGGTATGCGGGGCCTTGAACX5TGGCCG 
ATGCGATGTTCAGAACGTGCATGACAGATTTCTTACACTCCCTGGAAAGGGGCCAGGT 
CCTCCCAGCGGATTCACTACTGAACCTCGTAGATGTGGAATTAATTTACGGAGGTGTT 
AAGTACATTCTCAAGGTGGCCCGGCAGTCTCTGACCATGTTCGTTCTCATCATGAATG 
GCTC CCACATCGAGATTG ATGCCCACCGGCTG AATG ATGGGGGG CTCCTGCTCTCCT A 

P A ATfifif! A A P AflPTAPAPPA PPT AP ATfi A AGG AAC! AttflTTG AP AC5TT APPGAATT ACP 
AT CGGCAAT AAG ACGTGTGTGTTTGAGAAGG AG AACGATCCT ACAGTCCTG AGATC CC 
CCTCGGCTGGGAAGCTGACACAGTACACAGTGGAGGATGGGGGCCACGTTGAGGCTGG 
GAG CAGCT ACX3CTGAG ATGGAGGTGATGAAGATGATCATG ACCCTGAACX3TT CAGGAA 
AGAGGCCGGGTG AAGT ACATCAAG CGTCCAGGTGCCGTGCTGG AAGCAGG CTGCGTGG 
TGG CCAGG CTGG AG CT CG ATG AC C CTTCT AAAGTC CACC CGG CTG AAC CG TT CACAGG 
AGAACTCCCTGCCCAGCAGACACTGCCCATCCTCGGAGAGAAACTGCACCAGGTCTTC 
CACAGCGTCCTGGAAAACCTCACCAACGTCATGAGTGGCTT^ 

TTTTTAGCATAAAGCTGAAGGAGTGGGTGCAGAAGCTCATGATGACCCTCCGGCACCC 

GTCACTGCCG CTGCTGGAGCTGC AGGAG ATCATGACC AGCGTGGCAGGCCGCAT CCCC 

GCCCCTGTGGAGAAGTCTGTCCGCAGGGTGATGGCCCAGTATGCCAGCAACATCACCT 

CGGTGCTGTGCCAGTTCCCCAGCCAGCAGATAGCCACCATCCTGGACTGCCATGCAGC 

CACCCTGCAGCGGAAGGCTGATCGAGAGGTCTTCTTCATCAACACCCAGAGCATCX^TG 

CAGTTGGTCCAGAGATACCGCAGCGGGATCCGCGGCTATATGAAAACAGTGGTGTTGG 

ATCTCCTGAGAAGATACTTGCGTGTTGAGAGCAAGGCAAGAGATGCTGATGCCAACAC 

CAGTGGGATGGTGGGGGGCGTCAGGAGCCTGAGCTTTACCTCTGTC 

TCCC CCGAATCCC ACTACG ACAAGTGTGTG ATAAACCT CAGGGAG CAGTTC AAGCCAG 

ACATGTCCCAGGTGCTGGACTGCATCTTCTCCCACGCACAGGTGGCCAAGAAGAACaV 

GCTGGTGATCATGTTGATCGATGAGCTGTGTGGC CC AGACCCTTC CCTGTCGGACG AG 

CTGATCTCCATCCTCAACG AGCTCACTCAGCTG AG CAAAAGCGAG CACTGCAAAGTGG 
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CCCTCAGAGCCCGGCAGATCCTGATTGCCTCCCACCTCCCCTCCTACGAGCTGCGGCA 
TAACC AGGTGGAGTCC ATTTTC CTGTCTGC CATTGACATGTACGGCCAC CAGTTCTGC 
CCCGAGAACCTCAAGAAATTAATACTTTCGGAAACAACCATCTTCGACGTCCTGCCTA 
CTTTCTTCTATCACGCAAACAAAGTCGTGTGCATGGCGTCCTTGGAGGTTTACGTGCG 
GAGGGGCTACATCGCCTATGAGTTAAACAGCCTGCAGCACCGGCAGCTCCCGGACGGC 
ACCTGCGTGGTAGAATTCCAGTTCATGCTGCCGTCCTCCCACCCAAACCGGATGACCG 
TGC CCATCAGC ATCAC CAACCCTG ACCTGCTG AGGCACAGCACAG AGCTCTTCATGGA 
CAGCGGCTTCTCCCCACTGTGCCAGCGCATGGGAGCCATGGTAGCCTTCAGGAGATTC 
GAGGACTTCACCAG AAATTTTG ATGAAGTCATCTCTTGCTTCG C CAACGTG CCCAAAG 
ACACCCCCCTCTTCAGCGAGGCC03CACCTCCCTATACTCCGAGGATGACTGCAAGAG 
CCTCAGAGAAGAGCCCATCCACATTCTGAATGTGTCCATCCAGTGTGCAGACCACCTG 
GAGGATGAGGCACTGGTGCCGATTTTACGGACATTCGTACAGTCCAAGAAAAATATCC 
TTGTGGATTATGGACTCCGACGAATCACATTCTTGATTGCCCAAGAGTTTGCAGAAGA 
TCGCATTTACCGTCACTTGGAACCTGCCCTGGCCTTCCAGCTGGAACTTAACCGGATG 
CGTAACTTCGATCTGACCGCCGTGCCCTGTGCCAACCACAAGATGCACCTTTACCTGG 
GTGCTGCCAAGGTGAAGGAAGGTGTGGAAGTGACGGACCATAGGTTCTTCATCCGCGC 
CATCATCAGGCACTCTGACCTGATCACAAAGGAAGCCTCCTTCGAATACCTGCAGAAC 
GAGGGTGAGCGGCTGCTCCTGGAGGCCATGGACGAGCTGGAGGTGGCGTTCAATAACA 
CCAGCGTGCGCACCGACTGCAACCACATCTTCCTCAACTTCGTGCCCACTGTCATCAT 
GGACCCCTTCAAGATCGAGGAGTCCGTGCGCTACATGGTTATGCGCTACGGCAGCCGG 
CTGTGGAAACTCCGTGTGCTACAGGCTGAGGTCAAGATCAACAtCCGCCAGACCACCA 
CCGGC AGTGCCGTTCCCATC CGCCTGTTCATCAC CAATGAGTCGGGCTACTACCTGGA 
CATCAGCCTCTACAAAGAAGTGACTGACTCCAGATCTGGAAATATCATGTTTCACTCC 
TTCGG C AACAAGCAAGGG CCCCAG CACGGG ATGCTG ATC AATACTCCCTACGTCACCA 
AGGATCTGCTCCAGGCCAAGCGATTCCAGGCCCAGACCCTGGGAACCACCTACATCTA 
TGACTTCCCGGAAATGTTCAGGCAGGCAAGTCCGGCGGCTCAGACGCGGGTACATGTG 
CACAATGTG CAGGCTCTCTTTAAACTGTGGGG CTCCCCAGACAAGT ATCCCAAAGACA 
TCCTGACATACACTGAATTAGTGTTGGACTCTCAGGGCCAGCTGGTGGAGATGAACCG 
ACTT CCTGGTGGAAATG AGGTGGGC ATGGTGG C CTTCAAAATG AGGTTTAAGACC CAG 
GAGTACCCGGAAGGACGGGATGTGATCGTCATCGGCAATGACATCACCTTTCGCATTG 
GATCCTTTGGCCCTGGAGAGGACCTTCTGTACCTGCGGGCATCCGAGATGGCCCGGGC 
AGAGGGCATTCCCAAAATTTACGTGGCAGCCAACAGTGGCGCCCGTATTGGCATGGCA 
G AGG AG AT CAAAC ACATGTT C C ACGTGG CTTGGG TGG AC C CAG AAG ACC CC CA C AAAA 
AAAAAAAAACAGTGG CTTT C AGTG CAGGG AACTGG ATTCG T AG C CT CACT AAAGT ATT 
TTTTAAGGGATTTAAATACCTGTACCTGACTCCCCAAGACTACACCAGAATCAGCTCC 
CTGAACTCCGTCCACTGTAAACACATCGAGGAAGGAGGAGAGTCCAGATACATGATCA 
CGGATATCATCGGG AAGG ATGATGG CTTGGGCGTGGAGAATCTGAGGGGCTCAGG CAT 
GATTGCTGGGGAGTCCTCTCTGGCTTACGAAGAGATCGTCACCATTAGCTTGGTGACC 
TG C CGAGC CATTGGGATTGGGGCCT ACTTGGTGAGGCTGGGCCAGCGAGTGATCC AGG 
TGG AG AATTCCCACATCATCCTC AC AGGAG CAAGTG CTCTCAACAAGGT CCTGGG AAG 
AGAGGTCTACACATCCAACAACCAGCTGGGTGGCGTTCAGATCATGCATTACAATGGT 
GTCTCCCACATCACCGTGCCAGATGACTTTGAGGGGGTTTATACCATCCTGGAGTGGC 
TCTCCTATATGCCAAAGGATAATCACAGCCCnSTCCCTATCATCACACCCACTGACCC 
CATTGACAGAGAAATTGAATTCCTCCCATCCAGAGCTCCCTACGACCCCCGGTGGATG 
CTTGCAGGAAGGCCTCACCCAACTCTGAAGGGAACGTGGCAGAGCGGATTCTTTGACC 
ACGGCAGTTTCAAGGAAATCATGGCACCCTGGGCGCAGACCGTGGTGACAGGACGAGC 
AAGGCTTGGGGGGATTCCCGTGGGAGTGATTGCTGTGGAGACACGGACTGTGGAGGTG 
GCAGTCCCTGCAGACCCTG CCAACCTGGATTCTGAGGC CAAG AT AATTCAG CAGG CAG 
GACAGGTGTGGTTCCCAGACTCAGCCTACAAAACCGCCCAGGCCGTCAAGGACTTCAA 
CCGGGAGAAGTTGCCCCTGATGATCTTTGCCAACTGGAGGGGGTTCTCCGGTGGCATG 
AAAG AC ATGTATG ACCAGGTG CTG AAGTTTGG AG CCT ACATCGTGGACGGC CTT AGAC 
AATACAAACAGCCCATCCTGATCTATATCCCGCCCTATGCGGAGCTCCGGGGAGGCTC 
CTGGGTGGTCATAGATGCCACCATCAACCCGCTGTGCATAGAAATGTATGCAGACAAA 
GAGAGCAGGGGTGGTGTTCTGGAACCAGAGGGGACAGTGGAGATTAAGTTCCGAAAGA 
AAGATCTGATAAAGTCCATGAGAAGGATCGATCCAGCTTACAAGAAGCTCATGGAACA 
GCTAGGGGAACCTGATCTCTCCGACAAGGACCGAAAGGACCTGGAGGGCCGGCTAAAG 
GCTCGCGAGGACCTGCTGCTCCCCATCTACCACCAGGTGGCGGTGCAGTTCGCCGACT 
TCCATGACACACCCGGCCGGATGCTGGAGAAGGGCGTCATATCTGACATCCTGGAGTG 
GAAGACCGCACGCACCTTCCTGTATTGGCGTCTGCGCCGCCTCCTCCTGGAGGACCAG 
GTCAAGCAGGAGATCCTGCAGGCCAGCGGGGAGCTGAGTCACGTGCATATCCAGTCCA 
TG CTGCGTCGCTGGTTCGTGGAGACGGAGGGGGCTGTC AAGGC CT ACTTGTGGG ACAA 
CAACCAGGTGGTTGTGCAGTGGCTGGAACAGCACTGGCAGG CAGGGGATGGCCCG CGC 
TCCACCATCCGTGAGAACATCACGTACCTGAAGCACGACTCTGTCCTCAAGACCATCC 
GAGGCCTGGTTG AAGAAAAC C CCGAGGTGGCCGTGGACTGTGTGATATACCTG AGCCA 
GC ACATC AGCCCAGCTG AG CGGG CGCAGGTCGTT C ACCTGCTGTCTACCATGGACAGC 
CCGGCCTCCACCTOA 




ORF Start: ATG at 1 


ORF Stop: TGA at 7495 




SEQIDNO:218 


2498 aa 


MW at 280484.4kD 


NOV76a, 

CG59641-01 Protein Sequence 


MVLLLCLSCLIFSCLTFSWLKIWGKMTDSKPITKSKSEANLIPSQEPFPASDNSGETP 
QRNGEGHTLPKTPSQAEPASHKGPKDAGRRRNSLPPSHQKPPRNPLSSSDAAPSPEIiQ 
ANGTGTQGLEATDTNGLS S SARPQGQQAGS PS KEDKKQANI KRQLMTNF ILGS FDDYS 
SDEDSVAGSSRESTRKGSRASLGALSLEAYLTTRPSMSGLHLVKRGREHKKLDLHRDF 
TVAS PAE FVTRFGGDRVI EKVLI ANNGI AAVKCMRS I RRWAYEMFRNERAI RFWMVT 
PEDLKANAEYIKMADHYVPVPGGPNNNNYANVELIVDIAKRIPVQAVWAGWGHASENP 
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KLPELLCKNGVAFLGPPSE^^ALGDKIASTWAQTI^VPTLPWSGSGLTVEWTEDDLi 
QQGKRISVPEDVYDKGCVKDVDEGLEAAERIGFPLMIKASEGGGGKGIRKAESAEDFP 
ILFRQVQSE I PGS PI FLMKIAQHARHLEVQILADQYGNAVSLFGRDCSIQRRHQKIVE 
EAPATIAPLAIFEFMEQCAIRLAKTVGYVSAGTVEYLYSQDGSFHFLELNPRLQVEHP 
CTEMIADWLPAAQLQIAMGVPLHRLKDIRLLYGESPWGVTPISFETPSNPPLARGHV 
IAARITSENPDEGFKPSSGTVQELNFRSSKNVWGYFSVAATGGLHEFADSQFGHCFSW 
GENREEAISNMWALKELSIRGDFRTTVEYLINLLETESFQNNDIDTGWLDYLIAEKV 
QAEKPDIMIX3WCGAIWADAMFRTCrm>FIiHSLERGQVLPADsLLNLVDVELIYGGV 
KYILKVARQSLTMFVLIMNGCHIEIDAHRLNDGGLLLSYNGNSYTTYMKEEVDSYRIT 
I GNKTCVF EKEND PTVLRS P S AGKLTQ YT VEDGGHVEAGSS YAEME VMKM I MTLNVQE 
RGRVKYIKRPGAVLEAGCWARLELDDPSKVHPAEPFTGELPAQQTLPIU3EKLHQVF 
HSVLENLTWVMSGFCLPEPVFSIKLKEWVQKIJIMTLRHPSLPLLELiQEIMTSVAGRIP 
APVEKSVRRVMAQYASNITSVLCQFPSQQIATHiDCHAATLQRKADREVFFINTQSIV 
QLVQRYRSGIRGYMKTWLDLLRRYLRVESKARDADANTSGMVGGVRSLSFTSVWCFV 
SPESHYDKCVINIjREQFKPDMSQVLDCIFSHAQVAKKNQLVIMLIDELCGPDPSLSDE 
LISILNELTQLSKSEHCKVALRARQILIASHLPSYELRHNQVESIFLSAIDMYGHQFC 
PENLKKLILSETTIFDVLPTFFYHANKVVCMASLEVYVRRGYIAYELNSLQHRQLPDG 
TCWE FQFML P S S H PNRMTVP I S I TNPDL LRHS TELFMDSG FS P LCQRMGAMVAFRRF 
EDFTRNFDEVI SC FANVPKDT PLFSEARTSLYS EDDCKSLREE P IH I LNVS I QCADHL 
EDEALVPILRTFVQSKKNILVDYGLRRITFLIAQEFAEDRIYRHLEPALAFQLELNRM 
RNFDLTAVPCANHKMHLYLGAAKVKEGVEVTDHRFFI RAI I RHSDLI TKEAS FEYLQN 
EGERLLLEAMDELEVAFNNTSVRTDCNHIFLNFVPTVIMDPFKIEESVRYMVMRYGSR 
LWKLRVLQAEVKINIRQTTTGSAVPIRLFITNESGYYLDISLYKEVTDSRSGNIMFHS 
FGNKQGPQHGMLINTPYVTKDLLQAXRFQAQTLGTTYIYDFPEMFRQASPAAQTRVHV 
HNVQALFKLWGSPDKYPKDILTYTELVLDSQGQLVEMNRLPGGNEVGMVAFKMRFKTQ 
EYPEGRDVIVIGNDITFRIGSFGPGEDLLYLRASEMARAEGIPKIYVAANSGARIGMA 
EE I KHMFHVAWVDPED PHKKKKTVAFSAGNWIRSLTKVFFKGFKYLYLTPQDYTRIS S 
LNSVHCKHIEEGGESRYMITDIIGKDDGLGVENLRGSGMIAGESSLAYEEIVTISLVT 
CRAIG IGAYLVRLGQRVIQVENSHI ILTGAS ALNKVLGREVYTSNNQLGGVQI MHYNG 
VSHITVPDDFEGVYTILEWLSYMPKDNHSPVPIITPTDPIDREIEFLPSRAPYDPRWM 
LAGRPH PTLKGTWQSGFFDHGS FKEIMAPWAQTWTGRARLGG I PVGVI AVETRTVEV 
AVPADPANLDSEAKI IQQAGQVWFPDSAYKTAQAVKDFNREKLPLMI FANWRGFSGGM 
KDMYDQVLKFGAY I VDGLRQYKQ P I LI Y I P P YAE LRGGS WWI DAT IN P LC I EMYAD K 
ESRGGVLEPEGTVEI KFRKKDLI KSMRRIDPAYKKLMEQLGEPDLSDKDRKDLEGRLK 
AREDLLLPIYHQVAVQFADFHDTPGRMLEKGVISDILEWKTARTFLYWRLRRLLLEDQ 
VKQ E I LQAS GELS HVH I QS MLRRWFVETEGAVKAYLWDNNQ WVQWLEQHWQ AGDG P R 
STIRENI TYLKHDSVLKTIRGLVEENPEVAVDCVI YLSQHI SPAERAQWHLLSTMDS 
PAST 



Further analysis of the NOV76a protein yielded the following properties shown in 
Table 76B. 



Table 76B. Protein Sequence Properties NOV76a 


PSort 
analysis: 


0.6850 probability located in endoplasmic reticulum (membrane); 0.6400 
probability located in plasma membrane; 0.4600 probability located in Golgi 
body; 0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Likely cleavage site between residues 25 and 26 



A search of the NOV76a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 76C. 
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Table 76C. Geneseq Results for NOV76a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV76a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAU32848 


Novel human secreted protein #3339 
- Homo sapiens, 2486 aa. 
[WO200179449-A2, 25-OCT-2001] 


26..2498 
1..2486 


2316/2555 (90%) 
2339/2555(90%) 


0.0 


AAR05707 


Acetyl-CoA-carboxylase - Gallus sp, 
2324 aa. [JP02057179-A, 26-FEB- 
1990] 


163..2498 
17..2324 


1728/2375 (72%) 
2003/2375 (83%) 


0.0 


AAB86033 


Bovine acetyl-coenzyme A 
carboxylase-alpha protein fragment - 
Bos taurus, 2288 aa. [DE19946173- 
Al, 05-APR-2001] 


204..2497 
14..2288 


1719/2342(73%) 
1969/2342(83%) 


0.0 


AAR98811 


Erysiphe graminis acetyl coenzyme 
A carboxylase - Erysiphe graminis 
f.sp.hordei, 2273 aa. [FR2727129- 
Al, 24-MAY-1996] 


235..2490 ! 
42..2271 


1045/2326(44%) 
1432/2326(60%) 


0.0 


AAY24150 


Candida albicans acetyl CoA 
carboxylase • Candida albicans, 2270 
aa. [W09932635-A1, 01-JUL-1999] 


239..2489 
88..2269 


1015/2300(44%) 
1396/2300(60%) j 


0.0 



In a BLAST search of public sequence databases, the NOV76a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 76D. 



Table 76D. Public BLASTP Results for NOV76a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV76a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


000763 


Acetyl-CoA carboxylase 2 (EC 
6.4. 1 .2) (ACC-beta) [Includes: 
Biotin carboxylase (EC 6.3.4.14)] - 
Homo sapiens (Human), 2483 aa. 


1..2498 
1..2483 


2349/2528 (92%) 
2384/2528 (93%) 


0.0 


070151 


ACETYL-COA CARBOXYLASE - 
Rattus norvegicus (Rat), 2456 aa. 


1..2497 
1..2455 


2068/2524 (81%) 
2224/2524 (87%) 


0.0 


CAA48770 


ACETYL-COA CARBOXYLASE 
(EC 6.4.1.2) - Homo sapiens 
(Human), 2339 aa. 


163..2498 
17..2339 


1921/2390 (80%) 
2086/2390 (86%) 


0.0 
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PI 1029 


Acetyl-CoA carboxylase (EC 
6.4.1.2) (ACC) [Includes: Biotin 
carboxylase (EC 6.3.4.14)] - Gallus 
gallus (Chicken), 2324 aa. 


1 63..2498 
17..2324 


1732/2375 (72%) 
2004/2375 (83%) 


0.0 


PI 1497 


Acetyl-CoA carboxylase 1 (EC 
6.4. 1 .2) (ACC-alpha) [Includes: 
Biotin carboxylase (EC 6.3.4. 1 4)] - 
Rattus norvegicus (Rat), 2345 aa. 


163. .2497 
17..2345 


1736/2396 (72%) 
1993/2396 (82%) 


0.0 



PFam analysis predicts that the NOV76a protein contains the domains shown in the 
Table 76E. 



Table 76E. Domain Analysis of NOV76a 


Pfam Domain 


NOV76a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


CPSase L chain: domain 1 of 
1 


249..372 


49/132 (37%) 
117/132(89%) 


2.2e-57 


CPSase_L_D2: domain 1 of 1 


374..619 


102/253 (40%) 
218/253 (86%) 


6.6e-118 


Biotin_carb_C: domain 1 of 1 


640..747 


40/118(34%) 
100/118(85%) 


1.9e-53 


biotin lipoyl: domain 1 of 1 


885..951 


22/75 (29%) 
56/75 (75%) 


6.5e-17 


Carboxyl_trans: domain 1 of 2 


1783..1878 


31/100(31%) 
88/100(88%) 


7.4e-34 


GTP cyclohydrol: domain 1 
of 1 


2287..2304 


6/18(33%) 
13/18 (72%) 


6.6 


Carboxyl_trans: domain 2 of 2 


1897..2374 


191/504 (38%) 
447/504 (89%) 


4.1e-258 



Example 77. 



The NOV77 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 77A. 



Table 77A. NOV77 Sequence Analysis 




SEQIDNO: 219 


1624 bp 
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NOV77a, 

CG59630-01 DNA Sequence 


CGCGCGCCGGGGATGGAGCCGCAGCCCGGCGGCGCCCGGAGCTGCCGGCGCGGGGCCC 
CCGGCGGCGCCTGCGAGCTGGGCCCGGCGGCCGAGGCGGCGCCCATGAGCCTCGCCAT 
CCACAG CACCACGGGCACCCGCTACGACCTGGCCGTG C CGCCCGACGAG ACGGTGG AG 
GGGCTGCGCAAGCGGTTGTCCCAGCGCCTCAAAGTGCCCAAGGAGCGCCTGGCTCTTC 
TCCACAAAGACACGCGGCTCAGTTCGGGGAAGCTGCAGGAGTTCGGCGTGGGTGATGG 
CAGCAAGCTGACCTTGGTACCCACCGTGGAAGCGGGCCTCATGTCTCAGGCCTCAAGG 
CCGGAACAGTCCGTGATGCAAG CTCTCGAGAGTCTCACGGAG ACG CAGCCCCC AG CGG 
CGCCCGGGCCGGGCCGGGCTGGCGGAGGAGGCTTCCGGAAATACAGATTCATTTTATT 
TAAGCGTCCGTGGCACCGACAGGGACCCCAGAGCCCAGAGAGGGG CGGCGAGAGG C CC 
CAGGTCAGTGACTTCCTGTCGGGCCGTTCGCCACTGACACTGGCCTTGCGTGTGGGCG 
ACCACATGATGTTCGTGCAGCTGCAGCTCGCGGCCCAGCACGCTCCACTGCAACACCG 
CCATGTGCTGGCCGCTGCGGCCGCCGCCGCTGCTGCGCGGGGGGACCCCAGCATAGCC 
TCCCCCGTGTCCTCGCCCTGCCGGCCGGTGTCCAGTGCCGCCCGAGTCCCCCCGGTGC 
CCACCAGCCCGTCCCCTGCATCTCCCTCGCCCATCACAGCCGGCTCCTTCCGGTCCCA 
CGCAGCCTCCACCACCTGCCCGGAGCAGATGGACTGCTCCCCCACGGCCAGCAGCAGT 
GCCAGTCCTGGTGCCAGCACCACGTCTACCCCAGGGGCCAGCCCTGCCCCCCGCTCCC 
GAAAACCCGGCGCCGTCATCGAGAGCTTTGTGAATCACGCCCCGGGGGTCTTCTCAGG 
GACCTTCTCTGGCACGCTACACCCCAACTGCCAAGACAGCAGCGGGCGGCCX3CGGCGT 
GACATCGGCACCATCCTGCAGATCCTGAACGACCTCCTGAGCGCCACCCGGCACTACC 
AGGGCATGCCCCCTTCGCTGGCCCAGCTCCGCTGCCACGCCCAGTGCTCCCCGGCCTC 
ACCGGCCCCCGACCTGGCCCCCAGAACTACCTCCTGCGAGAAGCTCACGGCTGCCCCC 
TCAGCCTCCCTGCTCCAGGGCCAGAGCCAGATCCGCATGTGCAAGCCCCCGGGTGACC 
GGCTT CGGCAGACAGAAAACCGCGCCACGCG CTGCAAGGTGGAACGGCTGCAGCTGCT 
TCTGCAGCAGAAACGGCTCCGTAGAAAGGCCCGGCGGGACGCGCGGGGTCCGTACCAC 
TGGTCACCC AGCCGCAAGGCCGG CCGCAG CG ACAGCAGTAG C AGCGGGGGCGG CGGCA 
GCCCCAGC?GAGGCCTCCGGCTTGGGCCTCGACTTCGAGGACTCCGTGTGGAAGCCAGA 
AGTCAACCCTGACATCAAGTCAGAGTTCGTGGTGGCTTAGGATCTTCGGATCGGCCAC 
CCTCGCCCCTCGCACCCCAGCCCAGGGCGGCGGGGACTCCGAGAGCCCCGGAGAGAAC 




ORF Start: ATG at 13 


ORF Stop: TAG at 1546 




SEQIDNO:220 


511 aa 


MW at 53949.3kD 


NOV77a, 

CG59630-01 Protein Sequence 


MEPQPGGARSCRRGAPGGACEI/SPAAEAAPMSLAIHSTTGTRYDIiAVPPDErrVEGLRK 
RLSQRLKVPKERLALLHKDTRIiSSGKI^ETOVGIXSSKLTLVPWEAGLMSQASRPEQS 
VMQALES LTETQP PAAPGPGRAGGGGFRKYRFI LFKR PWHRQGPQS PERGG ERPQVSD 
FLSGRS PLTLALRVGDHMMFVQLQLAAQHAPI^HRHVIiAAAAAAAAARGDPS I AS PVS 
S PCRPVS SAARVP PVPTS PS PAS PS PI TAGS FRSHAASTTCPEQMDCS PTASS S AS PG 
ASTTS T PG AS P A P RS RK PG AV I E S F VNHAPG VFSGT FSGTLH PNCQD S S GRPRRD I GT 
ILQILNDLLSATRHYQGMPPSLAQLRCHAQCSPASPAPDLAPRTTSCEKLTAAPSASL 
LQGQSQIRMCKPPGDRLRQTENRATRCKVERLQLLLQQKRLRRKARRDARGPYHWSPS 
RKAGRSDSSSSGGGGSPSEASGLGLDFEDSVWKPEVNPDIKSEFWA 



Further analysis of the NOV77a protein yielded the following properties shown in 
Table 77B. 



Table 77B. Protein Sequence Properties NOV77a 


PSort 
analysis: 


0.3000 probability located in microbody (peroxisome); 0.3000 probability 
located in nucleus; 0.1526 probability located in lysosome (lumen); 0.1000 
probability located in mitochondrial matrix space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV77a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 77C. 
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Table 77C. Geneseq Results for NOV77a 


Geneseq 
Identifier 


Protein/Organism/Length (Patent #, 
Date] 


NOV77a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for | 
the Matched 
Region | 


Expect 
Value 


AAB56832 


Human prostate cancer antigen protein 
sequence SEQ ID NO: 1410 - Homo 
sapiens, 236 aa. [WO200055174-A1, 
21-SEP-2000] 


267..493 
1..227 


189/227(83%) \ 
195/227(85%) 


e-104 



In a BLAST search of public sequence databases, the NOV77a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 77D. 



Table 77D. Public BLASTP Results for NOV77a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV77a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9JJJ6 


MIDNOLIN - Mus musculus 
(Mouse), 508 aa. 


1..511 
1..508 


475/514 (92%) 
486/514(94%) 


0.0 


Q96BW8 


SIMILAR TO MIDNOLIN - 
Homo sapiens (Human), 177 aa 
(fragment). 


338..5U 
4..177 


174/174 (100%) 
174/174 (100%) 


2e-97 


Q9W2S4 


CG9732 PROTEIN - Drosophila 
melanogaster (Fruit fly), 989 aa. 


213..363 
524.. 677 


58/155 (37%) 
80/155 (51%) 


6e-18 


AAL40834 \ 


BPLF1 - Human herpesvirus 4 
(Epstein-Barr virus), 3 1 79 aa. 


200..406 
320..530 


64/223 (28%) 
95/223 (41%) 


2e-07 


Q9BKV7 


PPG3 - Leishmania major, 1325 
aa. 


213..328 
984.. 1104 


37/121 (30%) 
66/121 (53%) 


2e-06 



PFam analysis predicts that the NOV77a protein contains the domains shown in the 
Table 77E. 
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Table 77E. Domain Analysis of NOV77a 


Pfam Domain 


NOV77a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


ubiquitin: domain 1 of 1 


31..99 


19/79(24%) 
46/79 (58%) 


0.00033 


PI3 PI4 kinase: domain 1 of 
1 


411. .427 


7/18(39%) 
14/18 (78%) 


1.5 



Example 78. 

The NOV78 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 7 8 A. 



Table 78A. NOV78 Sequence Analysis 




SEQIDNO: 221 


1034 bp 


NOV78a, 

CG59561-01 DNA Sequence 


CCACCGCCACAGCTGCCAGCATOTCTGGCCCAGACATCAAGACGCCGACCGCCATCCA 
GATCTGCCGGATTATGCGGGACGCTAATGTGGCCCGCAATGTCTACGGCGGGACCATC 
CTGAAGATGATCAAAGAGGCGGGCGCCATCATCAGCACCCGGCATTGCAATCCGCAGA 
ACGGGGAT CGCTGTGTGGCCG CTCTGGCTCGGGTCGAGTG CACCCACTT C CTGTGGC C 
CATGTGCATCGGTGAGGTGGCCCACGTCAGCGCGGAGATCACCTACACCTCCAAGCAC 
TCTGTGGAGGTGCAGGTCAACATG ATGTCCGAAAACATCCTCACAGGTG CCAAAAAG C 
TGACCAATAAGGCCACCCTCTGGTATGCGCCCCTGTCGCTGACGAACGTGGACAAGGT 
CCTCGAAGAGCCTCCTGTTGTGTATTTCCGGCAGGAGCAGGAGGAGGAGGGCCAGAAG 
CGG T A C AAAACCCAG AAG CTGGAG CGC ATGG AG A C CAACTGG AGG AACGGGG A CAT CG 
TCC AG CC AGTCCTCAACCC AGAGC CGAACACTGT CAGCTACAGCCAGTC CAGCTTGAT 
CCACCTGGTGGGGCCTTCAGACTGTACCCTGCACAGCTTCGTGCATGAAGGGGTGACC 
ATGAAGGTCATGGACGAGGTCGCCGGGATCTTGGCTCCACGCCACTGCAAGACCAACC 
TCGTCACAGCCTCCATGGAGGC CATTAATTTTG ACAACAAGATCAGAAAAGG CTGCAT 
CAAGACCATCTCCGGACGCATGACCTTCACGAGCAATAAGTCCGTAGAGATCGAGGTC 
TTGGTGGATGCCGACTGTGTTGTGGACAGCTCTCAGAAGCGCTACAGGGCCGCCAGTG 
TCTTCACCTATGTGTCGCTGAGCCAGGAAGGCAGGTCGCTGCCCATGCCCCAGCTCGT 
GCCGGAGACCCAGGACGAGAAGGGCTTTGAGGCCTGGCTCGGTGGCTCACGCCTATAA 
TC C CAGCACTTT AGG ATGCTGAGGCAGGCGG AT CACTTG ACGTCAGGA 




ORF Start: ATG at 21 


ORF Stop: TAA at 984 




SEQ ID NO: 222 


321 aa 


MW at 35738.7kD 


NOV78a, 

CG59561-01 Protein Sequence 


MSGPDIKTPTAIQICRIMRDANVARNVYGGTILKMIKEAGAIISTRHCNPQNGDRCVA 
AIARVECTHFLWPMCIGEVAHVSAEITYTSKHSVEVQVNMMSENILTGAKKLTNKATL 
WYAPLSLTNVDKVLEEPPvVYFRQEQEEEGQKRYKTQKLERMETNWRNGDIVQPVTiNP 
EPNTVSYSQSSLIHLVGPSDCTLHSFVHEGVTMK\^EVAGILAARHCKTNLVTASME 
AINFDNKI RKGCI KTI SGRMTFTSNKSVEI EVLVDADCWDSSQKRYRAASVFTYVSL 
SQEGRSLPMPQLVPETQDEKGFEAWLGGSRL 



Further analysis of the NOV78a protein yielded the following properties shown in 
Table 78B. 
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Table 78B. Protein Sequence Properties NOV78a 


PSort 
analysis: 


0.8000 probability located in microbody (peroxisome); 0.1000 probability located 
in mitochondrial matrix space; 0.1000 probability located in lysosome (lumen); 
0.0000 probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV78a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 78C. 



Table 78C. Geneseq Results for NOV78a 


Geneseq I 
Identifier j 


Protein/Organism/Length [Patent #, 
Date] 


I\OV78a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAW /4oVO 


Human secreted protein encoded by 
gene 169 clone HPTTU1 1 - Homo 
sapiens, 339 aa. [W09839448-A2, 11- 
SEP-1998] 


1..313 


11 51515 (a/vo) 
292/313(93%) 


e-i 54 


AAY71115 


Human Hydrolase protein- 13 
(HYDRL-13) - Homo sapiens, 375 aa. 
[WO200028045-A2, 18-MAY-2000] 


1..310 
33..316 


247/313 (78%) 
266/313 (84%) 


e-133 


AAY35275 \ 


Chlamydia pneumoniae 
transmembrane protein sequence - 
Chlamydia pneumoniae, 155 aa. 
[WO9927105-A2, 03-JUN-1999] 


187..310 
16..138 


35/124 (28%) 
72/124 (57%) 


le-09 


AAG92590 \ 


C glutamicum protein fragment SEQ 
ID NO: 6344 - Corynebacterium 
glutamicum, 339 aa. [EP1 108790- A2, 
20-JUN-2001] 


24..309 
35..307 


69/296 (23%) 
112/296(37%) 


7e-08 


AAB76624 ! 


Corynebacterium glutamicum MCT 
protein SEQ ID NO:230 - 
Corynebacterium glutamicum, 339 aa. 
[WO200100805-A2, 04-JAN-2001] 


24..309 
35..307 


69/296 (23%) 
1 12/296 (37%) 


7e-08 



In a BLAST search of public sequence databases, the NOV78a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 78D. 



Table 78D. Public BLASTP Results for NOV78a 
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Protein 
Accession 
Number 


Protein/Organism/Length 


NOV78a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


000154 


Cytosolic acyl coenzyme A thioester 
hydrolase (EC 3.1.2.2) (Long chain acyl- 
CoA thioester hydrolase) (CTE-II) 
(Brain acyl-CoA hydrolase) (BACH) - 
Homo sapiens (Human), 338 aa. 


1..310 
1..313 


274/313(87%) 
293/313(93%) 


e-154 


Q91V12 


ACYL-COA HYDROLASE 
(HYPOTHETICAL 37.6 KDA 
PROTEIN) - Mus musculus (Mouse), 
338 aa. 


1..310 
1..313 


265/313(84%) 
287/313 (91%) 


e-150 


Q64559 


Cytosolic acyl coenzyme A thioester 
hydrolase (EC 3.1.2.2) (Long chain acyl- 
CoA thioester hydrolase) (CTE-II) 
(Brain acyl-CoA hydrolase) (BACH) 
( ACT) ( L ACH 1 ( ACH 1 } - Rattus 1 
norvegicus (Rat), 338 aa. 


1..310 
1..313 


263/313 (84%) 
286/313(91%) 


e-149 


JC5416 


palmitoyl-CoA hydrolase (EC 3. 1 .2.2), 
hepatic - rat, 343 aa. j 


12.310 
17..318 


251/302(83%) 
276/302 (91%) 


e-142 


Q9Y541 


DJ202O8.3.1 (HBACH (BRAIN ACYL- 
COA HYDROLASE (ACYL 
COENZYME A THIOESTER 
HYDROLASE, EC 3.1.2.2)) (ISOFORM i 
1)) - Homo sapiens (Human), 237 aa 
(fragment). 


1..202 
33..236 


181/204(88%) 
190/204 (92%) 


e-100 



PFam analysis predicts that the NOV78a protein contains the domains shown in the 
Table 78E. 



Table 78E. Domain Analysis of NOV78a 


Pfam Domain 


NOV78a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


Acyl-CoA hydro: domain 1 
of 1 


165..305 


46/147 (31%) 
131/147 (89%) 


l.le-47 



Example 79. 



The NOV79 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 79A. 
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Table 79 A. NOV79 Sequence Analysis 




SEQ ID NO: 223 


4203 bp 


NOV79a, 

GG59452-01 DNA Sequence 


AATGTGATGGGATCACTAGCATGTCTGQGGAGAGCGGCCCTGGGACGAGATTGAGAAA 
TCTGCCAGTAATGGGGGATGGACTAGAAACTTCCCAAATGTCTACAACACAGGCCCAG 
GCCCAACCCCAGCCAGCCAACGCAGCCAGCACCAACCCCCCGCCCCCAGAGACCTCCA 
ACC CTAACAAG CCCAAGAGGCAG ACCAAC CAACTG CAAT ACCTGCTCAG AGTGGTGCT 
CAAGACACTATGGAAACACCAGTTTGCATGGCCTTTCCAGCAGCCTGTGGATGCCGTC 
AAGCTGAACCT CCCTG ATTACT AT AAG ATCA'TT AAAACGC CTATGGAT ATGGG AACAA 
TAAAGAAGCGCTTGGAAAACAACTATTACTGGAATGCTCAGGAATGTATCCAGGACTT 
CAACACTATGTTTACAAATTGTTACATCTACAACAAGCCTGGAGATGACATAGTCTTA 
ATGGCAG AAG CTCTGGAAAAGCTCTT CTTG C AAAAAATAAATGAGCTACCCACAG AAG 
AAACCG AG ATCATG AT AGTC CAGG C AAAAGG AAG AGG ACGTGGGAGG AAAGAAACAGG 
TACAGCAAAACCTGGCGTTTCCACGGTACCAAACACAACTCAAGCATCGACTCCTCCG 
CAGACCCAGACCCCTCAGCCGAATCCTCCTCCTGTGCAGGCCACGCCTCACCCCTTCC 
CTGCCGTCACCCCGGACCTCATCGTCCAGACCCCTGTCATGACAGTGGTGCCTCCCCA 
GCCACTGCAGACGCCCCCGCCAGTGCCCCCCCAGCCACAACCCCCACCCGCTCCAGCT 
CCCCAGCCCGTACAGAGCCACCCACCCAT<^TCGCGGCCACCCCACAGCCTGTGAAGA 
CAAAGAAGGGAGTGAAGAGGAAAGCAGACACCACCACCCCCACCACCATTGACCCCAT 
TCACGAGCCACCCTCGCTGCCCCCGGAGCCCAAGACCACCAAGCTGGGCCAGCGGCGG 
GAG AG CAGCCGG CCTGTGAAACCTCCAAAGAAGG ACGTGCCCGACTCTCAGCAGCACC 
CAGCACCAGAGAAGAGCAGCAAGGTCTCGGAGCAGCTCAAGTGCTGCAGCGGCATCCT 
CAAGGAGATGTTTGCCAAGAAGCACGCCGCCTACGCCTGGCCCTTCTACAAGCCTGTG 
GACGTGGAGGCACTGGGCCTACACGACTACTGTGACATCATCAAGCACCCCATGGACA 
TGAGCACAATCAAGTCTAAACTGGAGGCCCGTGAGTACCGTGATGCTCAGGAGTTTGG 
TGCTG ACGT C CG ATTG ATGTTCTCCAACTG CTATAAGTACAACCCTCCTG ACCATGAG 
GTGGTGGCCATGGCCCGCAAGCTCCAGGATGTGTTCGAAATGCGCTTTGCCAAGATGC 
CGGACGAGCCTGAGGAGCCAGTGGTGGCCGTGTCCTCCCCGGCAGTGCCCCCTCCCAC 
CAAGGTTGTGG CCCCGCCCTCATCCAGCGACAGC AGCAG CG ATAGCTCCTCGGACAGT 
GACAGTTCGACTGATGACTCTCAGGAGGAGCGAGCCCAGCGGCTGGCTGAGCTCCAGG 
AGCAGCTCAAAGCCGTGCACGAGCAGCTTGCAGCCCTCTCTCAGCCCCAGCAGAACAA 
ACCAAAG AAAAAGG AGAAAGAC AAGAAGGAAAAGAAAAAAGAAAAG CACAAAAGG AAA 
GAGGAAGTGGAAGAGAATAAAAAAAGCAAAGCCAAGGAACCTCCTCCTAAAAAGACGA 
AGAAAAATAATAGCAGCAACAGCAATGTGAGCAAGAAGGAGCCAGCGCCCATGAAGAG 
CAAGCCCCCTCCCACGTATGAGTCGGAGGAAGAGGACAAGTGCAAGCCTATGTCCTAT 
G AGGAG AAGCGGGAG CT C AG CTTGGACATCAACAAGCTC C CCGGCG AG AAGCTGGGCC 
GCGTGGTGCAC ATCATC C AGTC ACGGG AGCCCTCCCTGAAGAATTCCAACCC CGACGA 
GATTGAAATCGACTTTGAGACCCTGAAGCCGTCCACACTGCGTGAGCTGGAGCGCTAT 
GTCAC CTCCTGTTTGCGG AAGAAAAGG AAACCTCAAG CTG AGAAAGTTG ATGTGATTG 
CCGGCTCCTCCAAGATGAAGGGCTTCTCGTCCTCAGAGTCGGAGAGCTCCAGTGAGTC 
CAGCTCCTCTGACAGCGAAGACTCCGAAACAGAGATGGCTCCGAAGTCAAAAAAGAAG 
GGG CACC CCGGG AGGG AG CAG AAGCAGCACCATCATCACCACCATCAGCAG ATG CAGC 
AGGCCCCGGCTCCTGTGCCCCAGCAGCCGCCCCCGCCTCCCCAGCAGCCCCCACCGCC 
TCCACCTCCGCAGCAGCAACAGCAGCCGCCACCCCCGCCTCCCCCACCCTCCATGCCG 
CAGCAGGCAGCCCCGGCGATGAAGTCCTCGCCCCCACCCTTCATTGCCACCCAGGTGC 
CCGTCCTGGAGCCCCAGCTCCCAGGCAGCGTCTTTGACCCCATCGGCCACTTCACCCA 
GCCCATCCTGCACCTGCCGCAGCCTGAGCTGCCCCCTCACCTGCCCCAGCCGCCTGAG 
CACAGCACTCCACCCCATCTCAACCAGCACGCAGTGGTCTCTCCTCCAGCTTTGCACA 
ACGCACTACCCCAGCAGCCATCACGGCCCAGCAACCGAGCCGCTGCCCTGCCTCCCAA 
GCCCGCCCGGCCCCCAGCCGTGTCACCAGCCTTGACCO^AACACCCCTGCTCCCACAG 
CCCCCCATGGCCCAACCCCCCCAAGTGCTGCTGGAGGATGAAGAGCCACCTGCCCCAC 
CCCT CACCTCCATGCAG ATGCAGCTGT ACCTG CAGC AGCTGCAG AAGGTG CAG CC CCC 
TACGCCGCTACTCCCTTCCGTGAAGGTGCAGTCCCAGCCCCCACCCCCCCTGCCGCCC 
CCACCCCACCCCTCTGTGCAGCAGCAGCTGCAGCAGCAGCCGCCACCACCCCCACCAC 
CCCAGCCCCAGCCTCCACCCCAGCAGCAGCATCAGCCCCCTCCACGGCCCGTGCACTT 
GCAGCCCATG C AGTTTTC CACCC ACATCCAACAGCCCCCG CC ACCCCAGGGCCAGCAG 
CCCCCCCATCCGCCCCCAGGCCAGCAGCCACCCCCGCCGCAGCCTGCCAAGCCTCAGC 
AAGTCATCCAGCACCACCATTCACCCCGGCACCACAAGTCGGACCCCTACTCAACCGG 
TCACCTCCG CG AAGCCCCCT CCCCGCTTATGATACATTC C CCCCAG ATGTCACAGTTC 
CAGAGCCTG ACCC ACCAGTCTCCACCCCAG CAAAACGTC CAGCCTAAGAAACAGGTAA 
CTGGCAGGGCTGGGCCAAGTCCTGTGGGCCAGGGCCGGGGGTGCCTGCCCACCTCACC 
GGCCGCTGTGCCTGTGCCATCCCAGGAGCTGCGTGCTGCCTCCGTGGTCCAG CCCCAG 
CCCCTCGTGGTGGTGAAGGAGGAGAAGATCCACTCACCCATCATCCGCAGCGAGCCCT 
TCAGCCCCTCGCTGCGGCCGGAG CCCCCC AAGCACCCGGAG AGCATCAAGGC CCCCGT 
TTATGTTCCAGGGCCGGAAATGAAGCCTGTGGATGTCGGGAGGCCTGTGATCCGGCCC 
C CAG AG CAGAACGCACCG CCACCAGGGGCCCCTGACAAGG AC AAACAG AAACAGG AGC 
CGAAGACTCCAGTTGCGCCCAAAAAGGACCTGAAAATCAAGAACATGGGCTCCTGGGC 
CAGCCTAGTGCAGAAGCATCCGACCACCCCCTCCTCCACAGCCAAGTCATCCAGCGAC 
AGCTTCGAGCAGTTCCGCCGCGCCGCTCGGGAGAAAGAGGAGCGTGAGAAGGCCCTGA 
AGGCT CAGGCCG AG CACG CTGAG AAGG AG AAGG AG CGG CTGCGG CAGG AG CGCATG AG 
G AGCCGAGAGG ACG AGGATGCGCTGG AG CAGGCCCGGCGGG CCCATG AGGAGGC ACGT 
CGGCGCC AGG AG CAGCAG CAGCAG CAG CG C CAGG AGC AAC AG CAG CAG CAGC AACAGC 
AAGCAGCTGCGGTGGCTGCCGCCGCCACCCCACAGGCCCAGAGCTCCCAGCCCCAGTC 
CATG CTGG AC CAG CAG AGGG AG TTGG C C CGG AAG CGGGAG CAGG AG CG AAG ACG CCGG 
G AAG CCATCGCAGCT ACC ATTG AC ATG AATTTCCAGAGTGATCT ATTG TCAATATTTG 
AAGAAAATCTTTTCTGAGCGCACCTAG 
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ORF Start: ATGat21 


ORF Stop: TGA at 4191 




SEQ ED NO: 224 


1390 aa MW at 154728.4kD 


NOV79a, 

CG59452-01 Protein Sequence 


MSAESGPGTRLRNLPVMGDGLETSQMSTTQAQAQPQPANAASTNPPPPETSNPNKPKR 
QTNQLQYLLRVVLKTLWKHQFAWPFQQPVDAVKLNLPDYYKIIKTPMDMGTIKKRLEN 
NYYWNAQ EC I QDFNTMFTNC Y I YNK PGDD I VLMAEALEKLF LQK INELPTEETEIMIV 
QAKGRGRGRKETGTAKPG VSTV PNTTQAST P PQTQT PQ PNP P P VQAT PH PF P AVT PDL 
IVsTTPVMTWPPQPLQTPPPVPPQPQPPPAPAPQPVQSHPPIIAATPQPVKTKKGVKR 
KADTTTPTTIDPIHEPPSLPPEPKTTKLGQRRESSRPVKPPKKDVPDSQQHPAPEKSS 
KVSEQLKCCSGILKEMFAKKHAAYAWPFYKPVDVEALGLHDYCDIIKHPMDMSTIKSK 
LEAREYRDAQEFGADVRLMFSNCYKYNPPDHEVVAMARKLQDVFEMRFAKMPDEPEEP 
WAVSSPAVPPPTKWAPPSSSDSSSDSSSDSDSSTDDSEEERAQRLAELQEQLKAVH 
EQLAALS QPQQNKPKKKE KD KKE KKKEKHKRKEE VE ENKKS KAKE P P P KKTKKNNS SN 
SNVSKKEPAPMKSKPPPTYESEEEDKCKPMSYEEKRQLSLDINKLPGEKLGRWHIIQ 
S RE PS LKN SN PDE I E I DF E T LKP S TLRE LER YVT S CLRKKRK PQAE KVDVI AGS S KMK 
GFSSSESESSSESSSSDSEDSETEMAPKSKKKGHPGREQKQHHHHHHQQMQQAPAPVP 
QQPPPPPQQPPPPPPPQQQQQPPPPPPPPSMPQQAAPAMKSSPPPFIATQVPVLEPQL 
PGS VFD P I GH FTQ PILHLPQPELPPHLPQPP EH STP PHLNQHA WS P P ALHNAL FQQ P 
SRPSNRAAALPPKPARPPAVSPALTQTPLLPQPPMAQPPQVLLEDEEPPAPPLTSMQM 
QLYLQQLQKVQPPTPLLPSVKVQSQPPPPLPPPPHPSVQQQLQQQPPPPPPPQPQPPP 
QQQHQPPPRPVHLQPMQFSTHIQQPPPPQGQQPPHPPPGQQPPPPQPAKPQQVIQHHH 
SPRHHKSDPYSTGHLREAPS PLMIHS PQMSQFQSLTHQSPPQQNVQPKKQVTGRAGPS 
PVGQGRGCLPTSPAAVPVPSQELRAASWQPQPLVWKEEKIHSPIIRSEPFSPSLRP 
EPPKHPES I KAPVYVPGPEMKPVDVGRPVI RPPEQNAPPPGAPDKDKQKQEPKTPVAP 
KKDLKI KNMGSWASLVQKHPTT PSSTAKSS SDS FEQFRRAAREKEEREKALKAQAEHA 
EKEKERLRQERMRSREDEDALEQARRAHEEARRRQEQQQQQRQEQQQQQQQQAAAVAA 
AATPQAQS SQPQSMLDQQRELARKREQERRRREAMAATI DMNFQSDLLS I FEENLF 



Further analysis of the NOV79a protein yielded the following properties shown in 
Table 79B. 



Table 79B, Protein Sequence Properties NOV79a 


PSort 
analysis: 


0.9800 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV79a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 79C. 



Table 79C. Geneseq Results for NOV79a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV79a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAY57898 


Human transmembrane protein 
HTMPN-22 - Homo sapiens, 688 aa. 
[W09961471-A2, 02-DEC-1999] 


1..667 
1..667 


667/667 (100%) 
667/667 (100%) 


0.0 


AAY07027 


Breast cancer associated antigen 


44.. 724 
13..708 


407/732 (55%) 
487/732 (65%) 


0.0 
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754 aa. [WO9904265-A2, 28-JAN- 
1999] ; 








AAY07114 


WO9904265 Seq ID No: 685 - 
Homo sapiens, 947 aa. 
fWO9904265-A2, 28-JAN- 19991 


35..738 
4..686 


357/761 (46%) 
444/761 (57%) 


e-170 


AAW81168 


Transcriptional regulatory factor 
RING3 - Homo sapiens, 947 aa. 
[WO9848015-A1, 29-OCT-1998] 


35.Z738 
4..686 | 


357/761 (46%)' 
444/761 (57%) 


e-170 


AAU16206 


Human novel secreted protein, Seq 
ID 1 159 - Homo sapiens, 235 aa. 
[WO200155322-A2, 02-AUG-2001] 


51..255 
1..203 


118/206 (57%) 
137/206 (66%) 


2e-59 



In a BLAST search of public sequence databases, the NOV79a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 79D. 



Table 79D. Public BLASTP Results for NO V79a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV79a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


060885 


Bromodomain-containing protein 4 
(HUNK1 protein) - Homo sapiens 
(Human), 1362 aa. 


1..1390 
1..1362 i 


1357/1391 (97%) 
1360/1391 (97%) 


0.0 


Q9ESU6 


CELL PROLIFERATION 
RELATED PROTEIN CAP - Mus 
musculus (Mouse), 1400 aa. 


1..1390 I 
1..1400 j 


1318/1400(94%) 
1338/1400(95%) 


0.0 


AAL67833 


BROMODOMAIN-CONTAINING 
PROTEIN BRD4 LONG 
VARIANT - Mus musculus 
(Mouse), 1400 aa. 


1..1390 : 
1..1400 j 


1318/1400(94%) 
1338/1400(95%) ; 


0.0 


060433 


R31546_l - Homo sapiens 
(Human), 731 aa (fragment). 


1..719 ! 
12..730 \ 


719/719(100%) 
719/719(100%) ; 


0.0 


AAL67834 


BROMODOMAIN-CONTAINING 
PROTEIN BRD4 SHORT ! 
VARIANT - Mus musculus 
(Mouse), 723 aa. 


1..719 
1..720 


694/720 (96%) 
700/720 (96%) 


0.0 



PFam analysis predicts that the NOV79a protein contains the domains shown in the 
Table 79E. 



Table 79E. Domain Analysis of NOV79a 
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Pfam Domain 


NOV79a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


bromodomain: domain 1 of 
2 


63.. 152 


42/92 (46%) 
82/92 (89%) 


8.6e-45 


bromodomain: domain 2 of 
2 


356..445 


40/92 (43%) 
81/92 (88%) 


3e-40 



Example 80. 

The NOV80 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 80A. 



Table 80A. NOV80 Sequence Analysis 




SEQ ID NO: 225 


1776 bp 


iNUVoUa, 

CG59572-01 DNA Sequence 


TGGTTCGTTTATTCCTGGGGTTGTCATATCATGGCTTATAATGACACAGACAGAAACC 


AGACTGAGAAGCTCCTAAAAAGAGTACGAGAACTGGAGCAAGAGGTGCAAAGACTTAA 
AAAGGAACAGGCCAAAAATAAGGAGGACTCAAACATTAGAGAAAATTCAGCAGGAGCT 
GGAAAAACTAAGCGTGCATTTGATTTCAGTGCTCATGGCCGAAGACACGTAGCCCTAA 
G AATAGCCT AT ATGGGCTGGGG ATAC CAGGG CTTTGCT AGTCAGG AAAACACAAATAA 
TACCATTGAAGAGAAACTGTTTGAAGCTCTAACCAAGACTCGACTAGTAGAAAGCAGA 
CAGACATCC AACT ATCACCG ATGTGGG AGAACAG ATAAAGG AGTT AGTG C CTTTGG AC 
AGGTGATCTCACTTGACCTTOjCTCTCAGTTTCCAAGGGGCAGGGATTCCGAGGACTT 
TAATGTAAAAGAGGAGGCTAATCCTGCTGCTGAAGAGATCCGTTATACCCACATTCTC 
AATCGGGTACTCCCTCCAGACATCCGTATATTGGCCTGGGCCCCTGTAGAACCAAGCT 
TCAGTGCTAGGTTCAGCTGCCTTGAGCGGACTTACCGCTATTTTTTCCCTCGTGCTGA 
TTTAGATATTGT AACCATGG ATTATGCAG CTCAGAAGTATGTTGG CACCCATGATTTC 
AGGAACTTGTGTAAAATGGATGTAGCCAACGGTGTGATTAATTTTCAGAGGACTATTC 
TATCTGCTCAAGTACAGCTAGTGGGCCAGAGCCCAGGTGAGGGGAGATGGCAAGAACC 
TTTCCAGTTATGTCAGTTTGAAGTGACTGGCCAGGCATTCCTTTATCATCAAGTCCGA 
TGTATGATGGCTATCCTCTTTCTGATTGGCCAAGGAATGGAGAAGCCAGAGATTATTG 
ATGAG CTGCTG AATATAG AGAAAAATCCCC AAAAG CCTCAATATAGT ATGGCTGTAG A 
ATTTCCTCTAGTCTTATATGACTGTAAGTTTGAAAATGTCAAGTGGATCTATGACCAG 
GAGGCTCAGGAGTTCAATATTACCCACCTACAACAACTGTGGGCTAATCATGCTGTCA 
AAACTCACATGTTGTATAGTATGCTACAAGGACTGGACACTGTTCCAGTACCCTGTGG 
AAT AGG ACC AAAG ATGG ATGG AATG A CAG AATGGGG AAATGTT AAG CCCT CTG T CAT A 
AAGCAGACCAGTGCCTTTGTAGAAGGAGTGAAGATGCGCACATATAAGCCCCTCATGG 
ACCGTCCTAAATGCCAAGGACTGGAATCCCGGATCCAGCATTTTGTACX3TAGGGGACG 
AATTGAGCACCCACATTTATTCCATGAGGAAGAAACAAAAGCCAAAAGGGACTGTAAT 
GACACACTAGAGGAAGAGAATACTAATTTGGAGACACCAACGAAGAGGGTCTGTGTTG 
ACACAGAAAT^AAAAGCATCATTTAACCATAGACAATTTGCCAGGATCTAGGAACCAC 




CTAAT<K3TAGGTGGACAGAAAAGGAAAAAAAAAAAAATTTACTTGCAAGTACTAGGAA 




TTCAGATGATCAGCTCTT AAAAAAAAAAAAAAAGCAAAAAGACT AAAGC CCTATT AAG 




GAAGTTATTGCTTTAATAAGAAATTTCAAATATTCTCTTATCCCGGTCCAAAAGGATT 




AAGC<5ATTAAAGAACGTAAAATGGAGATGTATTTACATACACCTGGAAACCTGTGCCT 




TG T ATT C AAATTCAT T AAAG CCT AAT CCTGC AAG AA 




ORF Start: ATG at 31 


ORF Stop: TAA at 1474 




SEQ ID NO: 226 


481 aa 


MW at 55646.8kD 


NOV80a, 

CG59572-01 Protein Sequence 


MA YNDTD RNGT EKLLKR VRELEQE VQRLKKEQAKNKEDS NI REN S AGAGKTKRAFDF S 
AHGRRHVALR I A YMGWG YQG FAS QENTNNT I E E KL F E ALTKTRLVE S RQTSNYHRCGR 
TDKGVSAFGQVISLDLRSQFPRGRDSEDFNVKE EANAAAEE IRYTH I LNRVLPPDI RI 
LAWAPVEPSFSARFSCLERTYRYFFPRADI4DIVTMDYAAQKYVGTHDFRNLCKMDVAN 
GVINFQRTI LSAQVQLVGQS PGEGRWQ E P FQLCQ FEVTGQAFL YHQ VRCMMAI LFLIG 
C^MEKPEIIDELLNIEKNPQKPQYSMAVEFPLVLYDCKFEWKWIYDQEAQEFNITHL 
QQLWANHAVKTHMLYSMLQGLDTVPVPCG I GPKMDGMTEWGNVKPS VI KQTSAFVEGV 
KMRTYKPLMDRPKCQGLESR I QHFVRRGRI EHPHLFHEEETKAKRDCNDTLEEENTNL 
ET PTKR VCVDTE I KS 1 1 




SEQ ID NO: 227 


1508 bp 


NOV80b, 


CATGGCTTATAATGACACAGACAGAAACCAGACTGAGAAGCTCCTAAAAAGAGTACGA 
GAACTGGAGCAAGAGGTGCAAAGACTTAAAAAGGAACAGGCCAAAAATAAGGAGGACT 
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CG59572-02 DNA Sequence 


CAAACATTAGAGAAAATTCAGCAGGAGCTGGAAAAACTAAGCGTGCATTTGATTTCAG 
TGCTCATGGCCGAAGACACGT AG CCCT AAGAAT AG CCTAT ATGGGCTGGGGATACCAG 
GGCTTTGCTAGTCAGGAAAACACAAATAATACCATTGAAGAGAAACTGTTTGAAGCTC 
T AAC CAAGACTCG ACTAGT AGAAAGCAGACAGAC ATC CAACT AT CACCGATGTGGGAG 
AAC AG ATAAAGGAGTTAGTGCCTTTGGACAGGTG ATCT CACTTGACCTTCGCTCT CAG 
TTTCCAAGGGGC^GGGATTCCGAGGACTTTAATGTAAAAGAGGAGGCTAATGCTGCTG 
CTGAAGAGATCCGTTATACCCACATTCTCAATCGGGTACTCCCTCCAGACATCCGTAT 
ATTGG C CTGGGCCCCTGTAGAACCAAGCTTCAGTG CTAGGTTCAGCTG CCTTGAGCGG 
ACTTACCGCTATTTTTTCCCTCGTGCTGATTTAGATATTGTAACCATGGATTATGCAG 
CTCAGAAGTATGTTGGCACCCATGATTTCAGGAACTTGTGTAAAATGGATGTAGCCAA 
CGGTGTGATTAATTTTCAGAGGACTATTCTATCTGCTCAAGTACAGCTAGTGGGCCAG 
AGC C C AGGTG AGGGG AG ATGG CAAG AACCTTTC CAGTT ATG T CAG TTTG AAG T G ACTG 
GCCAGGCATTCCTTT ATCATCAAGTCCGATGTATGATGG CTATC CTCTTTCTG ATTGG 
C CAAGG AATGG AG AAGC CAG AG ATT ATTG ATG AGCTG CT G AATAT AG AG AAAAAT C C C 
CAAAAGCCTCAATATAGTATCGCTGTAGAATTTCCTCTAGTCTTATATGACTGTAAGT 
TTGAAAATGTCAAGTGGATCTATGACCAGGAGGCTCAGGAGTTCAATATTACCCACCT 
ACAACAACTGTGGGCTAATCATGCTGTCAAAACTCACATGTTGTATAGTATGCTACAA 
GGACTGGACACTGTTCCAGTACCCTGTGGAATAGGACCAAAGATGGATGGAATGACAG 
AATGGGGAAATGTTAAGCCCTCTGTCATAAAGCAGACCAGTGCCTTTGTAGAAGGAGT 
GAAGATGCGCACATATAAGCCCCTCATGGACCGTCCTAAATGCCAAGGACTGGAATCC 
CGGAT CCAGCATTTTGTACGTAGGGGACG AATTGAGCAC CCACATTT ATTCCATGAGG 
AAGAAACAAAAGCCAAAAGGGACTGTAATGACACACTAGAGGAAGAGAATACTAATTT 
GGAGACACCAACGAAGAGGGTCTGTCTTGACACAGAAATTAAAAGTATCATTTAACCA 
TAGACAATTTGCCAGGATCTAGGAACCACCTAATGGTAGGTGGACAGAAAAGGAAAAA 




ORF Start: ATG at 2 |ORF Stop: TAA at 1445 




SEQIDNO:228 |481 aa 


MW at 55646.8kD 


NOV80b, 

CG59572-02 Protein Sequence 


MAYNDTDRNQTEKLLKRVRELEQE VQRLKKEQAKNKEDSN I RENSAGAGKTKRAFDFS 
AHGRRHVALRIAYMGWGYQGFASQENTtOTIEEKLFEALTKTRLVESRQTSNYHRCGR 
TDKGVSAFGQVISLDLRSQFPRGRDSEDFNVKEEANAAAEEIRYTHILNRVLPPDIRI 
I^WAPVEPSFSARFSCLERTYRYFFPRADIJ)IVTMDYAAQKYVGTHDFRNLCKMDVAN 
GVINFQRT I LSAQVQLVGQS PGEGRWQE P FQIiCQFEVTGQAFL YHQ VRCMMA I LFLIG 
QGMEKPEIIDELLNIEKNPQKPQYSMAVEFPLVLYDCKFENVKWIYDQEAQEFNITHL 
QQLWANHA VKTHMLYSMLiQGLDTVP VPCG I G PKMDGMTEWGNVKPS V I KQT S AFVEG V 
KMRTYKPLMDRPKCO^LESRIQHFVRRGRIEHPHLFHEEETKAKRDCNDTLEEENTNL 
ETPTKRVCVDTEIKSII 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 80B. 



Table 80B. Comparison of NOV80a against NOV80b. 


Protein Sequence 


NOV80a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV80b 


1..481 

1..481 ; 


459/481 (95%) 
459/481 (95%) 



Further analysis of the NOV80a protein yielded the following properties shown in 
Table 80C. 



Table 80C. Protein Sequence Properties NOV80a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.1000 probability located in 
mitochondrial matrix space; 0.1000 probability located in lysosome (lumen); 
0.0142 probability located in microbody (peroxisome) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV80a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 80D. 



Table 80D. Geneseq Results for NOV80a 


Geneseq 
iQcaiiiier 


Protein/Organism/Length [Patent 
tff isacej 


NOV80a 
Residues/ 

iviatcn 
Residues 


Identities/ 
Similarities for 
toe iviatcnea 
Region 


Expect 
Value 


AAM79457 


Human protein SEQ ID NO 3103 - 

nomo Sapiens, Hyyj da. 
[WO200157190-A2, 09-AUG-2001] 


1..481 


478/481 (99%) 


0.0 


AAM78473 


Human protein SEQ ID NO 1 1 35 - 

Homo <saniens 48 1 aa 
[WO200157190-A2, 09-AUG-2001] 


1..481 
1..481 


478/481 (99%) 
480/481 f99%1 


0.0 


AAG64907 \ 


Human depressed growth rate 
protein DEG1 - Homo sapiens, 248 
aa. [CN1296014-A, 23-MAY-2001] 


209..431 ! 
1..223 


223/223 (100%) 
223/223 (100%) 


e-132 


AAG02637 


Human secreted protein, SEQ ID 
NO: 6718 - Homo sapiens, 96 aa. 
[EP1033401-A2, 06-SEP-2000] 


361..456 I 
1..96 


96/96(100%) 
96/96 (100%) 


5e-53 


AAB96592 


Putative P. abyssi pseudourydilate 
synthase I - Pyrococcus abyssi, 263 
aa. [FR2792651-A1, 27-OCT-2000] 


65. .367 
3..261 i 


79/305 (25%) 
140/305 (45%) 


4e-16 



In a BLAST search of public sequence databases, the NOV80a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 80E. 



327 



WO 02/072757 



PCT/US02/06908 



Table 80E. Public BLASTP Results for NOV80a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV80a 
Residues/ 
Match 


Identities/ 
Similarities for 
the Matched 

Pnriinn 


Expect 
Value 


Q9BZE2 


FKSG32 - Homo sapiens (Human), 481 
aa. 


1..481 
1..481 


481/481 (100%) 
481/481 (100%) i 


0.0 


Q96J23 


HYPOTHETICAL 55.6 KDA 
PROTEIN - Homo sapiens (Human), 
481 aa. 


1..481 
1..481 


478/481 (99%) 
480/481 (99%) 


0.0 


Q96NB4 


CDNA FLJ31 140 FIS, CLONE 
IMR322001218, HIGHLY SIMILAR 
TO MUS MUSCULUS 
PSEUDOURIDINE SYNTHASE 3 
fPUS31 MRNA - Homo saniens 
(Human), 48 1 aa. 


1.481 
1..481 


478/481 (99%) 
479/481 (99%) 


0.0 


Q9JI38 


PSEUDOURIDINE SYNTHASE 3 - 
Mus musculus (Mouse), 481 aa. 


5..480 
4..480 


407/479 (84%) 
434/479 (89%) 


0.0 


Q9D0F7 


2610020J05RIK PROTEIN - Mus 
musculus (Mouse), 316 aa. 


5..314 
4..315 


276/312(88%) 
291/312(92%) 


e-158 



PFam analysis predicts that the NOV80a protein contains the domains shown in the 
Table 80F. 



Table 80F. Domain Analysis of NOV80a 


Pfam Domain 


NOV80a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


PseudoU synth 1 : domain 1 
of i 


88..307 


70/249 (28%) 
176/249(71%) 


4.7e-57 



Example 81. 

The NOV81 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 81 A. 



Table 81A. NOV81 Sequence Analysis 




SEQIDNO:229 3080 bp 


NOV81a, 

CG59522-01 DNA Sequence 


TTCCAGCCGGCAGGATGGAGGACGAGGAAGGCCCTGAGTATGGCAAACCTGACTTTGT 
GCTTTTGG ACCAAGTGAC CATGG AGGACTTCATG AGGAAC CTGC AGCTCAGGTTCG AG 
AAGGGCCGCATCTACACCTACATCGGTGAGGTGCTGGTGTCCGTGAACCCCTACCAGG 
AG CTG CC C CTG T ATGGG C CTG AGGCC ATCG CCAGG T A CC AGGG C CGTG AG CT CT AT G A 
GCGGCCACCCCATCTCT ATG CTGTGGCCAACGCCGCCT ACAAGG CAATGAAG CACCGG 
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TCCAGGGACACCTGCATCGTCATCTCAGGGGAGAGTGGGGCAGGGAAGACAGAAGCCA 

GT AAGCACATCATG CAGTACATCGCTG CTGTCACCAAT CCAAGCCAGAGGGCTGAGGT 

GGAGAGGGTCAAGGACGTGCTGCTCAAGTCCACCTGTGTGCTGGAGGCCTTTGGCAAT 

GCCCGCACCAACCGCAATCACAACTCCAGCCGCTTTGGCAAGTACATGGACATCAACT 

TTGACTTCAAGGGGG ACCCG ATCXJG AGG AC ACAT CCACAGCT ACCTACTGGAGAAG TC 

TCGGGTCCTCAAGCAGCACGTGGGTGAAAGAAACTTCCACGCCTTCTACCAATTGCTG 

AG AGGCAGTG AGGACAAG CAGCTGCATGAACTGC ACTTGG AGAG AAACCCTG CTGTAT 

A CAATTTCACAC AC CAGGG AGC AGGACT C AACATG ACT G TG AG TG ATG AG CAG AG CCA 

CCAGGCAGTGACCGAGGCCATGAGGGTCATCGGCTTCAGTCCTGAAGAGGTGGAGTCT 

GTG CATCG CATCCTGGCTGCCATATTGCACCTGGGAAACATCGAGTTTGTGGAGACGG 

AGGAGGGTGGGCTGCAGAAGGAGGGCCTGGCAGTGGCCGAGGAGGCACTGGTGGACCA 

TGTGGCTGAGCTGACGGCCACACCCCGGGACCTCGTGCTCCGCTCCCTGCTGGCTCGC 

ACAGTTG CCTCGGGAGGCAGGGAACTCAT AGAGAAGGG CCACACTGCAGCTGAGGCCA 

GCTATGCCCGGGATGCCTGTGCCAAGGCAGTGTACCAGOSGCrGTTTGAGTGGGTGGT 

GAACAGGATCAACAGTGTCATCGAACCCCOGGGCCGGGATCCTCGGCGTGATGGCAAG 

GACACAGTC^TTGGCGTGCTGGACATCTATGGCITCGAGGTCT^ 

T CG AGCAG TT CTG CATCAACT ACTGC AACG AG AAGCTG C AG CAG CT ATT CAT C CAG CT 

CATCCTGAAGCAGGAACAGGAAGAGTACGAGCGCGAGGGCATCACCTGGCAGAGCGTT 

GAGTATTTCAACAACGCCACCATTGTGGATCTGGTGGAGCGGCCCCACCGTGGCATCC 

TGGCCGTGCTGGACGAGGCCTGCAGCTCTGCTGGCACCATCACTGACCGAATCTTCCT 

GCAGACCCTGGACATG CACCACCGCCATCACCTACACTACACCAGCCG CCAG CTCTGC 

CCCACAGACAAG ACCATGGAGTTTGG CCGAG ACTTCCGGATCAAGCACT ATG CAGGGG 

ACGTCACGTACTCCGTGGAAGGCTT^TCGACMGAACAGAGATTTCCTCTTCCAGGA 

CTTCAAGCGGCTGCTGTACAACAGCACGGACCCCACTCTACGGGCCATGTGGCCGGAC 

GGG C AG CAGGACATCACAGAGGTGACCAAG CGCCCCCTGACGGCTGGCACACTCTTCA 

AGAACTCCATGGTGGCCCTGGTGGAGAACCTTGCCTCCAAGGAGCCCTTCTACGTCCG 

CTG CAT CAAG CCCAATGAGG AC AAGG TAGCTGGG AAG CTGG ATGAG AACC ACTGT CG C 

CACCAGGTCGCATACCTGGGGCTGCrGGAGAATGTGAGGGTCCGCAGGGCTGGCTTCG 

CTTC C CGCCAGCCCTACTCTCG ATTCCTGCTCAGGTAC AAGATGAC CTGTGAAT AC AC 

ATGGCCCAACCACCTCCTGGGCTCCGACAAGGCAGCCGTXSAGCGCTCTCCTGGAGCAG 

CACGGGCTGCAGGGGGACGT^CCTTTXSGCCACAGCAAGCTGT^CATCCGCTCACCCC 

GGACACTGGTCACACTGGAGCAGAGCCGAGCCCGCCTCATCCCCATCATTGTGCTGCT 

ATTGC^GAAGGCATGGCXXKSGCACCTTCGCGAGGTCGCGCTCCCGGAGGCTCAGGGCT 

ATCTACACCATCATGCGCTGGTTCCGGAGACACAAGGTGCGGG CTCACCTGG CTGAG C 

TGCAGCGGCGATTCCAGGCTGCAAGGCAGCCGCCACTCTACGGGCGTGACCTTGTGTG 

GCCGCTGCCCCCTGCTGTGCTG(^GCCCTrCChGGhCACCTGC(^CGCACTCTTCTGC 

AGGTGGCGGGCCCrGGCAGCTGGTGAAGAACATCCCCCCTTCAGACATGCCCCAGATCA 

AGGCCAAGGTGGCCGCCATGGGGGCCCTGCAAGGGCTTCGTCAGGACTGGGGCTGCCG 

ACGGGCCTCGGCCCGAGACTACCTGTCCTCTGCCACTGACAATCCCACAGCATCAAGC 

CTGTTTGCTCAGCGACTAAAGACACTTCAGGACAAAGATGGCTTCGGGGCTGTGCTCT 

TTTCAAGCCATGTCCG CAAGGTGAACCG CTTCC ACAAGATCCGG AACCGGGC CCTCCT 

GCTCACAGACCAGCACCTCTACTAAGCrGGACCCrGACCGGCAGTACCGGGTGATGCGG 

GCCGTGCCCCTTGAGGCGGTGACGGGGCTGAGCGTGACCAGCGGAGGAGACCAGCTGG 

TGGTGCTGCACGCCCGCGGCCAGGACGACCTCGTGGTGTGCCTGCACCGCTCCCGGCC 

GCC ATTGG ACAACCGCGTTGGGGAGCTGGTGGGCGTGCTGGCCG CACACTGCCG CAGG 

GAGGGCCGCACCCTGGAGGTTCGCGTCTCCGACTGCATCCCACTAAGCCATCGCGGGG 

TCCGGCGCCTCATCTCCGTGGAGCCCAGGCCGGAGCAGCCAGAGCCCGATTTCCGCTG 

CGCTCGCGGCTCCTTCACCCTGCrrCTGGCCCAGCCGCTQAGCGCCCGCACCCGCCGCA 

CCCCGA 




ORF Start: ATG at 15 


ORF Stop: TGA at 3054 




SEQ ID NO: 230 


1013 aa 


MWatll6044.5kD 


NOV81a, 

CG59522-01 Protein Sequence 


MEDEEGPEYGKPDFVLLDQVTMEDFMRNLQLRFEKGRIYTYIGEVLVSVNPYQELPLY 
GPEAIARYQGRELYERPPHLYAVANAAYKAMKHRSRDTCIVISGESGAGKTEASKHIM 
QYIAAVTNPSQRAEVERVKDVLLKSTCVLEAFGNARTNRNHNSSRFGKY^DINFDFKG 
DPIGGHIHSYLLEKSRVLKQHVGERNFHAFYQLLRGSEDKQLHELHLERNPAVYNFTH 
QGAGLNMTVSDEQSHQAVTEAMRVIGFSPEEVES VHRI LAAI LHLGNI EFVETEEGGL 
Q KEG LAVAEE ALVDHVAE LTAT PRD LVLRS LLARTVASGGR E LIE KGHT AAEAS YARD 
ACAKAVYQRLFEWWNR I NS VMEPRGRDPRRDGKDTVIGVLDI YGFEVF PVNS FEQFC 
INYCNEKLQQLFIQLI LKQEQEEYEREGITWQSVEYFNNATI VDLVERPHRGI LAVLD 
E ACS S AGT I TDR I FLQTLDMHH RHH LH YTSRQLC PTDKTME FGRD FR I KHY AGD VTYS 
VEGFIDKNRDFLFQDFKRLLYNSTDPTLRAMWPDGQQDITEVTKRPLTAGTLFKNSMV 
ALVENLAS KEPFYVRC I KPNEDKVAGKLDENHCRHQVAYLGLLENVRVRRAGFASRQ P 
YSRFLLRYKMTCEYTWPNHLLGSDKAAVS ALLEQHGLQGDVAFGHS KLFI RS PRTLVT 
LEQSRARLI PI I VLLLQKAWRGTLARWRCRRLRAI YTIMRWFRRHKVRAHLAELQRRF 
QAARQPPLYGRDLVWPLPPAVLQPFQDTCHALFCRWRARQLVKNIPPSDMPQIKAKVA 
AMGALQGLRQDWGCRRAWARDYLSSATDNPTASSLFAQRLKTLQDKDGFGAVLFSSHV 
RKVNRFHKIRNRALLLTDQHLYKLDPDRQYRVMRAVPLEAVTXSXjSWSGGDQLVVIjHA 
RGQDDL WCLHRS R P P LDNR VG E L VG VLAAHCRREGRTLE VRVS DC I P LSHRG VRRL I 
SVEPRPEQPEPDFRCARGSFTLLWPSR 



Further analysis of the NOV8 la protein yielded the following properties shown in 
Table 81B. 
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Table 81B. Protein Sequence Properties NOV81a 


PSort 
analysis: 


0.8800 probability located in nucleus; 0.3902 probability located in microbody 
(peroxisome); 0.2210 probability located in lysosome (lumen); 0.1000 
probability located in mitochondrial matrix space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV81a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 81C. 



Table 81 C. Geneseq Results for NOV81a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV81a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAU23125 


Novel human enzyme polypeptide 
#21 1 - Homo sapiens, 1026 aa. 
[WO200155301-A2, 02-AUG- 
2001] 


1..1013 
9..1026 


1009/1018 (99%) 

1 r\ 1 1 /1 r\ 1 o /Ann/ \ 

101 1/1018 (99%) 


0.0 


AAU23128 


Novel human enzyme polypeptide 
#214 - Homo sapiens, 909 aa. 
[WO200155301-A2, 02-AUG- 
2001] 


1..853 
9..866 


851/858(99%) 
851/858(99%) 


0.0 


AAM80123 


Human protein SEQ ID NO 3769 - 
Homo sapiens, 764 aa. 
[WO200157190-A2, 09-AUG- 
2001] 


243..1011 
1..762 


438/769 (56%) 
570/769 (73%) 


0.0 


AAM79139 


Human protein SEQ ID NO 1801 - 
Homo sapiens, 753 aa. 
[WO200157190-A2, 09-AUG- 
2001] 


254..10H 
1..751 


434/758 (57%) 
564/758 (74%) 


0.0 


AAM39991 


Human polypeptide SEQ ID NO 
3136 - Homo sapiens, 1063 aa. 
[WO200153312-A1, 26-JUL-2001] 


10..933 
47..986 


410/966 (42%) 
556/966 (57%) 


0.0 



In a BLAST search of public sequence databases, the NOV81a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 8 ID. 



Table 81D. Public BLASTP Results for NOV81a 




Protein/Organ ism/Length 
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Accession 
Number 




Residues/ 

Match 
Residues 


Similarities for 
the Matched 

VMM V 1T1U ItUVU 

Portion 


Value 


Q63357 


MYOSIN I - Rattus norveeicus 
(Rat), 1006 aa. 


1..1011 
1..1004 


606/1011 (59%) 
780/1011 (76%) 


0.0 


A53933 


myosin I myr 4 - rat, 1006 aa. 


1..1011 
1..1004 


604/1011 (59%) 
778/1011 (76%} 


0.0 


Q96RI6 


UNCONVENTIONAL MYOSIN 
1G VALINE FORM - Homo sapiens 

fHumarA 633 aa f fragment^ 


33..646 
• 1-619 


612/619 (98%) 
612/619 (98%) 


0.0 


Q96RI5 


UNCONVENTIONAL MYOSIN 
1G METHONINE FORM - Homo 
sapiens (Human), 633 aa (fragment). 


33..646 
1-619 


611/619(98%) 
612/619 (98%) 


0.0 


Q23978 


Myosin IA (MIA) (Brush border 
myosin IA) (BBMIA) - Drosophila 
melanogaster (Fruit fly), 101 1 aa. 


8..1012 
6.. 1007 


503/1017(49%) 
686/1017(66%) 


0.0 



PFam analysis predicts that the NOV81 a protein contains the domains shown in the 
Table 81E. 



Table 81E. Domain Analysis of NOV81a 


Pfam Domain 


NOV81a Match j 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


PRK: domain 1 of 1 


97..109 


8/13 (62%) 
10/13 (77%) 


3.7 


Vir DNA binding: domain 1 
of 1 


575..S92 


5/18 (28%) 
14/18 (78%) 


8.2 


myosin_head: domain 1 of 1 


11. .689 


305/747 (41%) 
531/747(71%) 


8.1e-288 



Example 82. 

The NOV82 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 82A. 



Table 82A. NOV82 Sequence Analysis 




SEQIDNO:231 1066 bp 


NOV82a, 

CG59520-01 DNA Sequence 


GAACGAATOGGAAACCAGAAATCAGATATTTATGCCCAAGCAAAGCAGGATTTCGTTC 
AGCACTACTCCCAGATCGTTAGGGTGCTGACTGAGGATGAGATGGGGCACCCAGAGAC 
AGGAGATGCTACTGCCCGGCTCAAGGAGGTCCTGGAGTACAATGCCATTGGAGGCAAG 
TATCACCGAGGTTTGATGGTGCTAGTAGCGTTCCGGGAGCTGGTGGAGCCGAGGAAAC 
TGGATGCTGATAGTCTCCAGTGGGCACCGACTGTGGGCTGGTATGCGCAACTGCTGCA 
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AGCTTTCTTCCTGGTGGCAGATGACATTATGGATTCATCCCTTACCTGCCAGGGACAG 
ATCTCCTGGTATCAGAAGCTGGGCATGGGTTTGGATGCCATCAATGATGCTATCCTTC 
TGGAAGCATGTATCTACTGCCTGCTGAAGCTGTATTGCCGGGAGCAGCCCTATTACCT 
GAACCTGATGGAGCTCTTCCAGCAGAATTCTTATCAGACTGAGATTGGGCAGACCCTC 
GACCTCATCACAACCCCCCAGGGCAATGTGGATCTTCGCAGATGCACCGAAAAAAGGC 
ACAAATCTGTTGTCAAGTACAAGACAGCTTTCTACTCCTTCTACCTTCCTGTAGCTGC 
AG CC ATGT ACATGTCAAGAATGGATGACAAGAAGG AGCACAC C AGTG CC AAGAAGAT C 
CTGCTGG AG ATT CAAGAGTTCTTTCAG ATT CAGGATG ATTACCTTG ACTTCTCTGGGG 
ACCCCAGTGTGACTGGCAGAGTTGGCAATGACTTCCAGGACAACAAATGCAGCTGGCT 
GGTGGTTC AGTGTCTG CTACAGG CC ACTCCAGAACAGT ACCAG AT C CTG AAGG AAAAT 
TACAGGCAGAAGGAGGCCGAGAAGGTGGCCCGGGTGAAGGCACTATACGAGGAGCTGG 
ATCTGCC AGCCGTGTTCTTGCAGT ATGAG AAAGACAGTTACAG C C ACGTT ATGGGTCT 
CATCGAAC AGTACG CAG AGCCCCTGCCCCCAGCCATCTTTCTGGGG CTTGTG CACAAA 
ATCTACAAGTGGAAAAAGTGAC 




ORF Start: ATG at 7 


ORF Stop: TGAat 1063 




SEQ ID NO: 232 


352 aa 


MW at 40740.3kD 


NOV82a, 

CG59520-01 Protein Sequence 


MGNQKSDIYAQAKQDFVQHYSQIVRVLTEDEMGHPETGDATARLKEVLEYNAIGGKYH 
RGLMVLVAFRELVEPRKLDADSLQWAPTVGWYAQLLQAFFLVADDIMDSSLTCQGQIS 
WYQKLGMGLD A INDA I LLEAC I YCLLKLYCREQ P YYLNLME LFQQNS YQTE IGQTLD L 
ITTPQGNVDLRRCTEKRHKSVVKYKTAFTSFYLPVAAAMYMSRMDDKKEHTSAKKILL 
EIQEFFQIQDDYLDFSGDPSVTGRVGNDFQDNKCSWLWQCLLQATPEQYQILKENYR 
QKEAEKVARVKALYEE LDLPAVFLQYEKDSYSHVMGLI EQYAE PLPPAI FLGLVHKI Y 
KWKK 



Further analysis of the NOV82a protein yielded the following properties shown in 
Table 82B. 



Table 82B. Protein Sequence Properties NOV82a 


PSort 
analysis: 


0.4066 probability located in microbody (peroxisome); 0.3000 probability 
located in nucleus; 0.1000 probability located in mitochondrial matrix space; 
0.1000 probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV82a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 82C. 



Table 82C. Geneseq Results for NOV82a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV82a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG29733 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 35427 - Arabidopsis \ 
thaliana, 342 aa. [EP1033405-A2, 06- ! 
SEP-2000] 


10..352 
2..342 


147/343 (42%) 
219/343 (62%) 


7e-75 
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AAG29732 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 35426 - Arabidopsis 
thaliana, 349 aa. [EP1033405-A2, 06- 
SEP-2000] 


10..352 
9.349 


147/343 (42%) 
219/343 (62%) 


7e-75 


AAG29734 


Arabidopsis thaliana protein fragment | 
SEQ ID NO: 35428 - Arabidopsis 
thaliana, 305 aa. [EP1033405-A2, 06- 
SEP-2000] 


47..352 
1..305 


138/306(45%) 
204/306 (66%) 


4e-73 


AAY43635 


Amino acid sequence of the farnesyl 
pyrophosphate synthase enzyme - 
Phaffia rhodozyma, 355 aa. 
[EP955363-A2, 10-NOV-1999] 


12..352 
11..355 


145/346 (41%) 
208/346 (59%) 


4e-69 


AAB48971 


Sunflower seedling farnesyl 
pyrophosphate synthase (FPS) - 
Helianthus annuus, 341 aa. 
[EP1 063297-A1 , 27-DEC-2000] 


13..352 
6..341 


138/343 (40%) 
204/343 (59%) 


3e-64 



In a BLAST search of public sequence databases, the NOV82a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 82D. 
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Table 82D. Public BLASTP Results for NOV82a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV82a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96G29 


FARNESYL DIPHOSPHATE 
SYNTHASE (FARNESYL 

r I KUrnUornA 1 li 

SYNTHETASE, 

DIMETHYLALLYLTRANSTRA 
NSFERASE, 

GERANYLTRANSTRANSFERA 
SE) - Homo sapiens (Human), 419 
aa. 


2..352 
69..419 


291/351 (82%) 
317/351 (89%) 


e-168 




rairiebyi pyropnospnaie syn inclose 
(FPP synthetase) (FPS) (Farnesyl 
diphosphate synthetase) [Includes: 
Dimethylallyltransferase (EC 
2.5.1.1); Geranyltranstransferase 
(EC 2.5.1 .10)] - Homo sapiens 
fl-TumaTi^ aa 


3..353 


/'COO/A 

317/351 (89%) \ 


e-ioo 


A35726 


farnesyl-pyrophosphate synthetase 
- human, 353 aa. . 


2..352 
3..353 


290/351 (82%) 
316/351 (89%) 


e-168 


\AL58886 


FARNESYL DIPHOSPHATE 
SYNTHASE - Bos taurus (Bovine), 
353 aa. 


2..352 
3..353 


270/351 (76%) 
308/351 (86%) 


e-157 


Q14329 


FARNESYL PYROPHOSPHATE 
SYNTHETASE LIKE-4 PROTEIN 
- Homo sapiens (Human), 348 aa. 


6..352 
2..348 


268/347 (77%) 
295/347 (84%) 


e-150 



PFam analysis predicts that the NOV82a protein contains the domains shown in the 
Table 82E. 



Table 82E. Domain Analysis of NOV 82a 


Pfam Domain 


NOV82a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


polyprenyl synt: domain 1 of 
1 


43..315 


82/285 (29%) 
237/285 (83%) 


6.3e-91 
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The NOV83 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 83A. 



Table 83A. NOV83 Sequence Analysis 




SEQ ID NO: 233 


411 bp 


NOV83a, 

CG59758-01 DNA Sequence 


TGCCTACCCCGAGACTG CTGCTGTTCGG AG ACCTGCAGGTGAATG C CCC ATCACCATG 


TCTGACCTGGAGG CAAAAC CTTCAACTGAG CATTTGGGGGATAAGAT AAAAGATG AAG 
ATATTAAACTCAGGGTTATTGGACAGGATAGCAGTGAGATTCATTTCAAAGTGAAAAT 
GACAACACCTCTCAAGAAACTCAAGAAATCGTACTGTCAGAGACAGGGCGTTCCAGTG 
AATTCCCTCAGGTTTCTCTTTGAAGGTCAGAGAATTGCTGATAATCATACTCCAGAAG 
AACTGGGAATGGAGGAAGAAGATGTGATTGAGGTTTATCAGGAACAAATCGGAGGTCA 
TTCAACAGTTTAGACATTCTTTTTTTTTTTCCTTTTCCCTCAATCCTTTTTTATTTTT 


TTAAA 




ORF Start: ATG at 56 


ORF Stop: TAG at 359 




SEQ ID NO: 234 


101 aa 


MWat 11526.0kD 


NOV83a, 

Cvjjy/5o-Ul rrotem sequence 


MSDLEAKPSTEHIXSDKIKDEDIKLRVIGQDSSEIHFKVKMTTPIjKKLKKSYCQRQGVP 
VNS LRFLFEGQR I ADNHTPE E LGME EED V I EVYQEQ I GGHSTV 




SEQ ID NO: 235 


658 bp 


NOV83b, 

v^ojy/Do-uz jjina sequence 


CTACCCCGAGACTGCTGCTGTTCGGAGACCTGCAGGTGAATGCCCCATCACCATGTCT 


GACCTGGAGGCAAAACCrrCAACTGAGCATTTGGGGGATAAGATAAAAGATGAAGATA 
TTAAACTCAGGGTTATTGGACAGGATAGCAGTGAGATTCATTTCAAAGTGAAAATGAC 
AACACCTCTCAAGAAACTCAAGAAATCGTACTGTCAGAGACAGGGCGTTCCAGTGAAT 
TCCCTCAGGTTTCTCTTTGAAGGTCAGAGAATTGCTGATAATCATACTCCAGAAGAAC 
TGGGAATGGAGGAAGAAGATGTGATTGAGGTTTATCAGGAACAAATCGGAGGTCATTC 
AACAGTTTAGACAATCGGAGGTCATTCAACAGTTTAGACAATCGGAGGTCATTCAACA 


GTTTAGACAATCGGAGGTCATTCAACAGTTTAGACAATCGGAGGTCATTCAACAGTTT 


AGACAATCGGAGGTCATTCAACAGTTTAGACAATCGGAGGTCATTCAACAGTTTAGAC 


AATCGGAGGTCATTCAACAGTTTAGACAATCGGAGGTCATTCAACAGTTTAGACAATC 


GGAGGTCATTCAACAGTTTAGACAATCGGAGGTCATTCAACAGTTTAGACAATCGGAG 


GTCATTCAACAGTTTAGACA 




ORF Start: ATG at 53 


ORF Stop: TAG at 356 




SEQ ID NO: 236 


101 aa 


MW at 11526.0kD 


NOV83b, 

CG59758-02 Protein Sequence 


MSDLEAKPSTEHLGDKIKDEDIKLRVIGQDSSEIHFKVKMTTPLKKLKKSYCQRQGVP 
VNSLRFtiFEGQRIADNHTPEELGMEEEDVIEVYQEQIGGHSTV 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 83B. 



Table 83B. Comparison of NOV 83a against NOV83b. 


Protein Sequence 


NOV83a Residues/ 
Match Residues j 


Identities/ 
Similarities for the Matched Region 


NOV83b 


1.101 

1.101 j 


101/101 (100%) 
101/101 (100%) 



Further analysis of the NOV83a protein yielded the following properties shown in 
Table 83C. 
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Table 83C. Protein Sequence Properties NOV83a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.1000 probability located in 
mitochondrial matrix space; 0.1000 probability located in lysosome (lumen); 
0.0000 probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV83a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 83D. 



Table 83D. Geneseq Results for NOV83a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV83a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM79976 


Human protein SEQ ID NO 3622 - j 
Homo sapiens, 125 aa. 
[WO200157190-A2,09-AUG-2001] ; 


1..101 
25.. 125 


100/101 (99%) 
100/101 (99%) 


le-52 


AAM78992 


Human protein SEQ ID NO 1654 - 
Homo sapiens, 101 aa. 
[WO200157190-A2, 09-AUG-2001] 


1..101 
1..101 


100/101 (99%) 
100/101 (99%) 


le-52 


AAY49967 


Human sentrin protein sequence - 
Homo sapiens, 101 aa. [US5985664- \ 
A, 16-NOV-1999] 


1..101 
1..101 


89/101 (88%) 
94/101 (92%) 


2e-45 


AAW87984 


Ubiquitin-like domain of the protein 
SUMOl - Mammalia, 101 aa. 
[W09857978-A1, 23-DEC-1998] 


1..101 
1..101 i 


89/101 (88%) 
94/101 (92%) 


2e-45 


AAW60079 


Homo sapiens sentrin- 1 polypeptide 
- Homo sapiens, 101 aa. 
[WO9820038-A1, 14-MAY-1998] 


1..101 
1..101 


89/101 (88%) 
94/101 (92%) 


2e-45 



In a BLAST search of public sequence databases, the NOV83a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 83E. 
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Table 83E, Public BLASTP Results for NOV83a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV83a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities 
for the 
Matched 
Portion 


Expect 
Value 


ridings 

\JyD\JOo 


uoiquitiiMiice protein ojvii precursor 
(Ubiquitin-homology domain protein 
PIC1) (Ubiquitin-like protein UBL1) 
(Ubiquitin-related protein SUMO- 1 ) 
(GAP modifying protein 1) (GMP1) 

101 aa. 


1..1U1 

1..101 


oyf 1U1 (ooYoJ 

94/101(92%) 




Q9MZD5 


SENTRIN - Cervus nippon (Sika deer), 
101 aa. 


1..101 
1..101 


88/101(87%) 
93/101(91%) 


2e-44 


057686 


SUMO-1 PROTEIN - Xenopus laevis 
(African clawed frog), 102 aa. 


1..100 
1..101 


83/101 (82%) 
90/101 (88%) j 


2e-39 


Q9PT08 


SMALL UBIQUITIN-RELATED 
PROTEIN 1 - Oncorhynchus mykiss 
(Rainbow trout) (Salmo gairdneri), 101 
aa. 


1..97 
1..97 


72/97 (74%) 
84/97(86%) 


9e-35 


Q9D466 


493341 1 G06RDC PROTEIN - Mus 
musculus (Mouse), 1 17 aa. 


1..97 
1..96 


68/97(70%) I 
80/97(82%) 


8e-30 



PFam analysis predicts that the NOV83a protein contains the domains shown in the 
Table 83F. 



Table 83F. Domain Analysis of NOV83a 


Pfam Domain 


NOV83a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


ubiquitin: domain 1 of 1 


20..95 


14/83 (17%) 1 
66/83 (80%) 


4.7e-18 



Example 84. 



The NOV84 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 84A. 



Table 84A. NOV84 Sequence Analysis 
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SEQ ID NO: 237 


912 bp 


NOV84a, 

CG59586-01 DNA Sequence 


ACTCACTAATGGGCTCGAGCGGCTGCCTGTGTTTCAGCGGCTCGGGGAAATCCACCGT 
GGGCGCCCTGCTGGCATCTGAGCTGGSATGGAAATTCTATGATGCTGATGATTATCAC 

C CGG Ann AAA ATfY3AAf2fiA a n. aTCfift a & & & finr at & rrv r ,r rr' a a tp a p r* a ppappf^a 

TGTGGTTCT AGCCTCJ TTC ACZCCCTCi A Af3 A A A A CdT A C A f2 Afi A C A T ATT A A P A P A A CiCZ A 

AAAfZATWITYVTAflf'Tr'Tn A a^TY^TT" a ci/z a fiTf^cinn b a afiraarra aafiraftrrTriafia 
T(5P AfiPTP CTfifiTfifiTr raT ptv a rrwyTrPTTTP a ryTp a tptpt/v a pp ptt a pt 

J.Vj\ v nVJUl^^lV7VJiV9UlCL^l^l\7Mot»kjuoiL.ul 1 luAwlLAiLlLluuALuLI lAV^l 

CAAAAGAGAGGGACATTTTATGCCCCCTGAATTATTGCAGTCCCAGTTTGAGACTCTG 
GAGCCCCCAGCAGCTCCAGAAAACTTTATCCAAATAAGTGTGGACAAAAATGTTTCAG 
AGATAATTGCTACAATTATGGAAACCCTAAAAATGAAATGACAATGATTTTGTATCAG 
TGGT CCAAACAGAACTAAGCATAAATCATTGTGC C ATCC CAAACCT CGTTCCAGCCGC 


CTTGCCCATACTAGATTCTAAATGTTTCTAAAGGCAAACCCCAATGTGTCAAGACAGA 


CTTGTTTAGGTGTAATTTTAGGAATTATGCTGGTTCATCAGGAAGCA.GAGGGGGAGTT 


TTAAAAGTCAAGCTTAAATTGAAGTTTAAATTCATCTATAACCAAATCAAATGATCAG 


AGGAAATTCTGTAATCAATGCTGGAAATCGTTACATTGTTTAGAACATTCTTGCTCAT 


GCCTGTATTTGCACAAATAAATGAAACTTCGCTGTAAAAAAA 




ORF Start: ATG at 9 


ORF Stop: TGAat 561 




SEQ ID NO: 238 


184aa MW at 20352.2kD 


NOV84a, 

CG59586-01 Protein Sequence 


MGSSGCLCFSGSGKSTVGALLASEl^WKFYDAJSDYHPEENRRKMGKGIPLNDQDRIPW 
LCNLHDILLRDVA5GQRWLACSALKKTYRDILTQGKDGVALKCEESGKEAKQAEMQL 
LWHLSGSFEVISGRLLKREGHFMPPELLQSQFETLEPPAAPENFIQISVDKNVSEII 
ATIMETLKMK 



Further analysis of the NOV84a protein yielded the following properties shown in 
Table 84B. 



Table 84B. Protein Sequence Properties NOV84a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.1000 probability located in 
mitochondrial matrix space; 0.1000 probability located in lysosome (lumen); 
0.1000 probability located in plasma membrane 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV84a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 84C. 



Table 84C. Geneseq Results for NOV84a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV84a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG73989 


Human colon cancer antigen protein 
SEQ LD NO:4753 - Homo sapiens, 
193 aa. [WO200122920-A2, 05-APR- 
2001] 


10.. 184 
19..193 


175/175 (100%) 
175/175 (100%) 


le-97 


AAB58998 








le-97 
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antigen protein sequence SEQ ID 706 
- Homo sapiens, 193 aa. 
[WO200055173-A1, 21-SEP-2000] 


19..193 


175/175 (100%) 




AAM89100 


Human immune/haematopoietic 
antigen SEQ ID NO: 16693 - Homo 
sapiens, 133 aa. [WO200157182-A2, 
09-AUG-2001] 


24.. 126 
22..124 


70/103 (67%) 
77/103 (73%) 


le-34 


AAG50675 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 64243 - Arabidopsis 
thaliana, 175 aa. [EP1033405-A2, 06- 
SEP-2000] 


10.. 179 
4..167 


75/173 (43%) 
102/173 (58%) 


4e-28 


AAG50674 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 64242 - Arabidopsis 
thaliana, 187 aa. [EP1033405-A2, 06- 
SEP-2000] 


10..179 
16..179 


75/173 (43%) 
102/173 (58%) 


4e-28 



In a BLAST search of public sequence databases, the NOV84a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 84D. 



Table 84D. Public BLASTP Results for NOV84a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV84a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


BAB74785 


GLUCONOKINASE - Anabaena 
sp. (strain PCC 7120), 160 aa. 


10..183 
9..160 


72/174(41%) 
101/174 (57%) 


le-30 


Q9RT56 


THERMORESISTANT 
GLUCONOKINASE - 
Deinococcus radiodurans, 172 aa. 


10..183 
4..159 


66/174 (37%) 
101/174 (57%) 


le-29 


CAC93415 


PUTATIVE GLUCONOKINASE 
(EC 2.7.1.12) - Yersinia pestis, 167 
aa. 


10..174 
12..159 


68/166 (40%) 
95/166 (56%) ! 


2e-29 


Q9CMM6 


GLK - Pasteurella multocida, 172 
aa. 


10..182 
15..169 


68/174 (39%) 
99/174(56%) 


2e-29 


AAK86014 


AGR_C_329P - Agrobacterium 
tumefaciens str. C58 (Cereon), 163 
aa. 


10..182 
5..159 


74/173 (42%) 
98/173 (55%) 


6e-29 



PFam analysis predicts that the NOV84a protein contains the domains shown in the 
Table 84E. 
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Table 84E. Domain Analysis of NOV84a 


Pfam Domain 


NOV84a Match Region 


Identities/ 
Similarities 
for the Matched Region , 


Expect Value 


SKI: domain 1 of 1 


9.. 182 


37/206(18%) ! 
114/206(55%) 


1.1 



Example 85. 

The NOV85 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 85A. 



Table 85A. NOV85 Sequence Analysis 




SEQIDNO:239 


4332 bp 


NOV85a, 

CG59704-01 DNA Sequence 


GGCGTATTAACGCGCGGGTGCACACCCCCACGGGCGCGCAATGAACAACTATGTGCTT 


AATGACGAG AT CGGC CAGGGTGCCTTCAGCACTATTT ACAAGGG CCGCT ATCGCAC CA 
CCACCGAGTTCTACG CGATTG CTTCCATCGACAAGAAGCG ACGGGAG CG CGTCGTG AA 
CTGCGTTCAGCTGTTACGCTCCATGCACCACTCAAACGTCATAGAGTTCCACAACTGG 
TATGAGACCAACAATCACTTGTGGATCATTACGGAGTACTGCACCGGCGGAGACATGA 
G CACGATCCTCCGCTCGAACATT AATCTCAC CACTCAGGCGGTCCAGGCGTTCGGCCG 
TGATGTGGCGATGGGCCTCATGTACATCCACAGTAAGGGTGTCGTGTATAACGACTTG 
CAGACTCGCAATCTGCTGATGGACTCCGCAGCAATGCTGCGCTTCCACGACTTTAGCT 
TGG CCTGTCTCTTCCAAGACGCGGCGACGCGG CCACTGGTGGGGACGCCACTGT ACAT 
GGCCCCCGAGTTGTTCATGGCGGATCGCCCGCTGTACTCGATGGCATCAGACCTGTGG 
TCCT^CGGTTGTGTGCTGCACGAGCTGGCGACAGGCAAGCCGCCCTTTGCCGCATCCG 
ACCTCGAGACGCTGCTGGGCGACATACTGACGAGTCCGACGCCAGCGGTGCCTGGTGC 
GCCGGAGTCCTTTCAAACGCTCCTGTGCGGCCTGCTGGAAAAGGACCCGTTGAAGCGC 
TACGCGTGGGTCG ATGTTGTCCGCAGCG AGTTCTGGG ATGAGC CCTTGC CGCTGCCG A 
GCAACGGCTTTCCATCTCAGGTGGCGTGGGAGGACTACAAGCGTTCGCGTTCTGGACG 
CGGTX5CGAGTCAGTATAATTGG ACGGACTCCGATGTGCGTGTGGCAGTGG CTCACGC C 
GTGGGGGCAGCGAAATCAAACG CTT CTACGCACAACGTGG AGG AGAGGG AGCG AGCGG 
CTGCGACGTTGAACGTCGCG AAGG AGCTGGACTTCACTGCAAG CG CGGCG ATGTTG CT 
GGAACGGTTACCGG AGCGGACACAGGAG CGTG CTGCGCACGCAACAGGCCATGTCGCG 
ACCGCGCACGGCAGCCTGGTGCACGGCTGCCCATCCACGGCCTCAGCGGCGACCTCGC 
CAAGACGTTCAAGG ACAAGGCGGCG CTGCTCAAGATTGTGGAAG AGGTCAAAACCG CT 
GTCGAGGGCTTCAAGCCGTGGGTGTCCTTCCACGTCGCTGCGCCACCCGGGCATGAGG 
GAGCGCCACTGG ACCGGCTTGTCTCAGAAG CTCGGGATGAAG CTGGTGC CTGGCGACA 
CACTGATGCTTCTGGAGGACTG CGAACCGCTGCTAGCGCACCGCGAC AC CATTATCAG 
CTACTGCGAGGTGGCCGCGAAGGAGTCGCAGATCGAGATGACGCTCAAGGACATGCGT 
GCCAAGTGGGAGACCAAGTGCTTCATCATCGAGGCATACAAGGAGACAGGCACGTACA 
TCCTCAAGGACACCTCCGAGGTGGTGGAGCTCCTCGACGAGCACCTCAACGTCGTCCA 
GCAGCTGCAGTTCTCTCCATTCAAGGGCTACTTCGAGGAGTCCATCACGGACTGGGAG 
CGCTCCCTCAACCTCATCTCCGACATACTCGAACAATGGCTGGAGTGCCAGCGAGCGT 
GGCGTTATCTGGAGCQ3ATCCTCAACTCGGAGGACATCGCCATGCAGCTACCGCGACT 
GTCCACGCTGTTCGAGAAGGTGGACCGCACATGGAGACGTGTCATGGGCAACGCGCAC 
GCGCAGCCAAACGCACTCGAGTACTGCATTGGCACAAACAAGCTCTTGGACCACCTGC 
GCGAGGCGAACCGGCTCCTCGAAGTGCTGCAGCACTTGATGGCGCAGAAGGTCAACGT 
TGCCGCTGTTGGTCCGACTGGCACCGGCAAGTCCATCTCACTCGCGCGTCTCGTGCTT 
GGCGGCGGCATGCCGGCCAACTTTCTTGGCCTCAACTTCACCTTCTCGGCGCAGACAA 
AGTGCACAGTCTTGCAGAATTCACTGATGGCCAAGTTCGATAAGCGGCGCTCGCACGT 
CTACGGCGCCCCTGCCGGTAAGCACTTTCTCATCTTCATTGACGACGCGAACCTGCCG 
CAGCC AGAGAAGT ACGGCG CGCAGCCC CCGGTGGAG CTTCTGCGGCAGATGC TCGCC C 
AAGGCGGCTTCTACAACTTTACAGGTGGCATCAAGTGGTCCTCCATCATCGACTGCTC 
GCTTGCGCTGGCGATGGGGCCGCCTGGCGGGGGCCGCAGCCGGGTTTCGAACCGCTTT 
ATGCGCTACTTCAATTACCTTGCCTTCCCCGAGATGTCGGACATGTCGAAGCGAACGA 
TCTTGCAGGCCATCCTCGTCGGCGGCCTCGCGCAGAGCGGCCTCGCTGACCGCCTCGC 
GAACGTCGCCTCCGCCGTGGTCGATAGCACGTTGCGGGTGTTTCGCAAGTGCACCCAG 
GTCTTTCTGCCG ACCCCGG CGCACGTG C ACTACTCCTTCAACATG CGGG ATGTGATG C 
GTGTTTTTCCCCTCTTGTACACAGCAGACAAGTCGGTGCTGCAGTCGGAGGAATCCAT 
CGTGCGGCTGTGGATGCACGAGATGCAGCGCGTCTTCTACGATCGCCTCGTCGACGCG 
ACAGACAAGGGTCTGTTCATCGAGTACCTCAATGCCGAGCTGCCGTCCATGGGGGTGG 
ACAAGTCCTACAACGAGGTAGTGAAGGCTGACCGCCTCATCTTTGCCGACGTACTGAG 
CGACAAGGG CGTGTACG AGCAGATT AC CG ACATGAACGCCCTC ACG ACACG C ATGAAT 
GAGCTGCTGGAGGCGTACAATGACGAGAATGAAGTGAAGATGAACCTCGTGCTCTTCC 
TCGACGCCATCGAGCATGTCTGCCGTATCTCGCGCGTGCTGCGACTGCCGAACGGGCA 
CTGCCTCCTCCTCGGCGTTGGCGGGTCGGGACGCAAGTCACTCACGCGCCTGGCTTGT 
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TCTCTGATTGCCGAGATGGAGGTGTTCACGATTGAGCTGTCGAAGAACTTCGGTGTCA 
AGG AATGGCACG AGAG CCTCG CGAAGTTGCTGCTCG AGTGTGGCAAGGACGAGAAG AA 
GCGGACGTTTCTCTTCGCCGACACCCAGCTGGCGCATCCGACGTTTCTGGAGGATGTG 
GCGGGCCTGCTCACATCGGGTGATGTGCCGAACCTCTTTGAGGACCAAGATATCGAGC 
TCATCAACGACAAGTTTCGCGGCGTCTGCCTAAGCGAGAACCTGCCAACGACGAAGGT 
GTCGGTGTACGCGCGCTTTGTGAAGG AGGCGCGAGC CAACCTG CACCTTGTGCTCG CC 
TTCTCTCCCATCGGAGAGGCGTTTCGCAGCCGCCTGCGTATGTTCCCATCGCTCATTG 
CGTGCTGCACAATCGACTGGTTTGCTGAGTGGCCATCCGAGGCGCTACTGTCGGTAGC 
CGCAGTGCAGCTGAACGCCGGCGACGTTACTGACGTCATGGGGGCGGCAAGCCATGCC 
GACTTGCCGGGCTGCTTCCAGGCAGTGCACCGCGCGGCGGCGGAGGTGACGGAGCGCT 
TCTTCACGGAAACGCGTCGTCGCTCGTACGTGACGCCGACGTCCTATCTGTCGCTCCT 
CTCCAACTTCAAAGTGATGGCGGCGGCAAAACGCCGCTTCGTTCGCGAGCAGCGCGGC 
CGCCT CGAGAAGGGG CTGGAGAAGCTGCGGCACACCG AGGTGCAAGTGG CGGAGCTGG 
AGGCC CAGCTCAAGGCG CAGCAGCCGGTTCTGGTGCAGAAAAAGG CAGAG ATT C AGTC 
GATGATGGAGCGGCTGACGGTGGACCGAAAGGAGGCGGCGGTGAAGGAGGCGGACGCG 
CGCAGGGAGGCCCAGCTTCCCGGTGGCCGTGCTGCATACGGCGGTGAAGATGACGAAT 
OAGCCGCCGATGGGGCTGCGGGCGAACGTGATGCGCTCCTACTACGGCTTCACTCCCG 
AGGACCTCGAGCAGGAGGAGAAGCCCGCCGAGTTCAAAAAGATGTTGATGGCATCCGC 


ATGCCTGGTCCCATACCCGAGCACTGAAGAGCAGGGTCTCTGGAGCCTGGCATCGTGG 


GGTGGCCCTCAGCTTCCCCACTCACTGTGGGAAGTTTCCTTAGTGTCTCTGAGCCTGT 


TTCCTCATCCGTTGCCTGAGGATAAACCTGCTTCAGGATTGTTGGTGAAAAGACTTCC 


CTCACCTAGCTTCTGTAACGCCACTGCATGCCACCACTGCTGAGTACTGTTTGTTTGC 


TAGGTTGGTGTCATTCTCATTTTACCAGAAAGTGAAGCTC 




ORF Start: ATG at 41 


ORF Stop: TGA at 3944 




SEQ ID NO: 240 


1301 aa MWatl46115.7kD 


NOV85a, 

CG59704-01 Protein Sequence 


MNNYVLNDEIGQGAFSTIYKGRYRTTTEFYAIASIDKKRRERVVNCVQLLRSMHHSNV 
IEFHNWYETNNHLWI ITEYCTGGDMSTI LRSNINLTSQAVQAFGRDVAMGLMYIHSKG 
VVYNDLQTRNLLMDSAAMLRFHDFSIiACLFQDAATRPLVGTPLYMAPELFPIADRPLYS 
MASDLWSFGCVLHELATGKPPFAASDLETLLGDILTSPTPAVPGAPESFQTLLCGLLE 
KDPLKRYAWVDVVRSEFWDEPLPLPSNGFPSQVAWEDYKRSRSGRGASQYNWTDSDVR 
VAVAHAVGAAKSNASTHNVEERERAAATLNVAKELDFTASAAMLLERLPERTQERAAH 
ATGHVATAHGSLVHGCPSTASAATSPRRSRTRRRCSRLWKRSKPLSRASSRGCPSTSL 
RHPGMRERHWTGLSQKLGMKLVPGDTLMLLEDCEPLLAHRDTIISYCEVAAKESQIEM 
TLKDMRAKWETKCFIIEAYKETGTYILKDTSEWELLDEHLNVVQQLQFSPFKGYFEE 
SITDWERSLNLISDILEQWLECQRAWRYLEPILNSEDIAMQLPRLSTLFEKVDRTWRR 
VMGNAHAQ PNALE YC I GTNKLLDHLREANRLLEVLQHLMAQKVNVAA VG PTGTG KS I S 
LARLVLGGGMPANFIjGLNFTFSAQTKCTVLQNSLMAKFDKRRSHVYGAPAGKHFLIFI 
DDANLPQPEKYGAQPPVELLRQMLAQGGFYNFTCGIKWSSIIDCSIJaiAMGPPGGGRS 
RVSNRFMRYFNYIAFPEMSDMSKRTILQAILVGGLAQSGIiADRLANVASAVVDSTLRV 
FRKCTQVFLPTPAHVHYS FNMRDVMRVFPLLYTADKS VLQS EES I VRLWMHEMQRVFY 
DRLVDATDKGLFIEYLNAELPSMGVDKSYNEWKADRLIFADVLSDKGVYEQITDMMA 
LTTRMNELLEAYNDENEVKMNLVLFLDAI EHVCR I SRVLRL PNGHCLLLGVGGSGRKS 
LTRLACSLIAEMEVFTIELSKNFGVKEWHESLAKLLLECGKDEKKRTFLFADTQLAHP 
TFLEDVAGLLTSGDVPNLFEDQDIELINDKFRGVCLSENLPTTKVSVYARFVKEARAN 
LH L VLA FSPIGEAFRSRLRMFPSLIACCTIDWFAEWPSEALLS VAA VQLN AG D VTD VM 
GAASHADLPGCFQAVHRAAAEVTERFFTETRRRSYVTPTSYLSLLSNFKVMAAAKRRF 
VREQRGRLE KGLEKLRHTEVQVAELEAQLKAQQP VLVQKKAE I QSMMERLTVDRKEAA 
VKEAD ARREAQLPGGRAA YGG EDDE 



Further analysis of the NOV85a protein yielded the following properties shown in 
Table 85B. 



Table 85B. Protein Sequence Properties NOV85a 


PSort 
analysis: 


0.8800 probability located in nucleus; 0.3562 probability located in microbody 
(peroxisome); 0.1671 probability located in lysosome (lumen); 0.1000 
probability located in mitochondrial matrix space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV85a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 85C. 



Table 85C. Geneseq Results for NOV85a 


Geneseq 

lUCllllllCI 


Protein/Organism/Length [Patent 


NOV85a 
Residues/ 

It Ail Ull 

Residues 


Identities/ 
Similarities for 

flip IVfatrli pH 

Region 


Expect 

V ill hp 
. T aiut 


AAM79863 


Human protein SEQ ID NO 3509 - 
Homo ^anierK 2127 aa 
[WO200157190-A2, 09-AUG-2001] 


602.. 1287 
168 847 


218/692 (31%) 
347/692 (49%) 


le-89 


AAM79862 


Human protein SEQ ID NO 3508 - 
Homo saniens 2127 aa 
[WO200157190-A2, 09-AUG-2001] 


602..1287 
168..847 


218/692(31%) 
347/692 ('49%') 


le-89 


AAM78879 


Human protein SEQ ID NO 1541 - 
Homo sapiens, 2143 aa. 
[WO200157190-A2, 09-AUG-2001] 


602.. 1287 
108..787 


218/692 (31%) 
347/692 (49%) 


le-89 


AAM78878 


Human protein SEQ ID NO 1540 - 
Homo sapiens, 2067 aa. 
[WO200157190-A2, 09-AUG-2001] 


602..1287 
108..787 


218/692 (31%) 
347/692 (49%) 


le-89 


AAM80293 


Human protein SEQ ID NO 3945 - 
Homo sapiens, 1774 aa. 
[WO200157190-A2, 09-AUG-2001] 


910.. 1293 
33..405 


153/393 (38%) 
227/393 (56%) 


5e-70 



In a BLAST search of public sequence databases, the NOV85a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 85D. 
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Table 85D. Public BLASTP Results for NOV85a 


Prntpin 

X 1 VlvlU 

Accession 
Number 


Protein/Organism/Length 


NOV85a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


AAL37427 


CILIARY DYNEIN HEAVY 
CHAIN 7 - Homo sapiens (Human), 
4024 aa. 


628.. 1293 , 
1975..2655 • 


271/692 (39%) 
395/692 (56%) 


e-132 


Q27812 


DYNEIN HEAVY CHAIN 
ISOTYPE 7B (EC 3.6.1.3) - 
Tripneustes gratilla (Hawaian sea 
urchinl 1314 aa ( fraemenrt 


601..1247 
654..1310 1 


264/667 (39%) 
389/667 (57%) 


e-127 


Q9MBF8 


1 BETA DYNEIN HEAVY CHAIN 
- Chlamydomonas reinhardtii, 4513 
aa. 


611..1293 i 
2486..3159 j 


257/693 (37%) 
377/693 (54%) 


e-117 


Q9VJC6 


DHC36C PROTEIN - Drosophila 
melanogaster (Fruit fly), 4010 aa. 


596.. 1275 ! 
1913..2604 | 


249/699 (35%) 
383/699 (54%) 


e-116 


Q9VWZ3 


DHC16F PROTEIN - Drosophila 
melanogaster (Fruit fly), 4081 aa. 


618..1301 1 
2022..2709 : 


248/704 (35%) 
380/704 (53%) 


e-108 



PFam analysis predicts that the NOV85a protein contains the domains shown in the 
Table 85E. 



Table 8SE. Domain Analysis of NOV85a 


. Pfam Domain 


NOV85a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


pkinase: domain 1 of 1 


4.. 250 


80/286 (28%) 
190/286 (66%) 


6.8e-62 


DEAD: domain 1 of 1 


613..637 


7/25 (28%) i 
22/25 (88%) 


0.83 


dNK: domain 1 of 1 


865.. 1020 


32/179(18%) 
101/179(56%) j 


6.8 



Example 86. 



The NOV86 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 86A. 
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Table 86A. NOV86 Sequence Analysis 




SEQE)NO:241 


1420 bp 


NOV86a, 

CG59628-01 DNA Sequence 


GTCCAGCTTTAGCTCTCTGCTCGCCGCCGCCGCTGTCGCCGCCACCTCCTCTGATCTA 


CGAAAGTCATGTTACCCAACACCGGGAGGCTGGCAGGATGTACAGTTTTTATCACAGG 
TG CAAGCCGTGG CATTGGCAAAG CT ATTG CATTGAAAG CAG CAAAGG ATGGAGCAAAT 
ATTGTTATTG CTGCAAAG ACCGCCCAGCCACATCC AAAACTTCT AGG CACAATCT ATA 
CTG CTGCTGAAG AAATTGAAGCAGTTGGAGGAAAGG CCTTG CCATGT ATTGTTGATGT 
G AG AG ATG AAC AG CAGATCAGTG C T G CAG TGG AG AAAG C C AT CAAG AAATTTGG AGG A 
ATTGATATTCTGGTAAATAATGCCAGTGCCATTAGTTTGACCAATACATTGGACACAC 
CTACCAAGAGATTGG ATCTGATG ATGAACGTGAACACCAGAGG CACCTACCTTG CATC 
TAAAGCATGTATTCCTTATTTGAAAAAGAGCAAAGTTGCTCATATCCTCAATATCAGT 
C CACCACTGAACCT AAATCCAGTTTGGTTCAAACAG CACTGTGCTT AT ACC ATTGCTA 
AGTATCGTATCTCTATGTATGTGCTTGGAATGGCAGAAGAATTTAAAGGTGAAATTGC 
AGTCAATG CATTATGG C CTAAAACAGCCATACACACTGCTGCTATGGATATGCTGGG A 
GGACCTGGTATCGAAAG C CAGTGT AGAAAAGTTGAT ATCATTGCAGATGCAGCATATT 

k-\_>l J. 111 (.VrHnnnutL/lnnnHU 1 1 i. X/iV_ 1 (jVj^riHL. 1 J. i\3± k_>A 1 lunlunnnnlnlVl J- 

AAAAGAAGAAGGAATAGAAAATTTTGACGTTTATGCAATTAAACCAGGTCATCCTTTG 
CAACCAGATTTCTTCTTAGATGAATACCCAGAAGCAGTTAGCAAGAAAGTGGAATCAA 
CTGGTGCTGTTCCAGAATTCAAAGAAGAGAAACTGCAGCTGCAACCAAAACCACGTTC 
TGGAGCTGTGGAAGAAACATTTAGAATTGTTAAGGACTCTCTCAGTGATGATGTTGTT 
AAAGCCACTCAAGCAATCTATCTGTTTGAACTCTCCGGTGAAGATGGTGGCACGTGGT 
TTCTTGATCTGAAAAGCAAGGGTGGGAATGTCGGAT ATGGAG AG CCTTCTGATCAGG C 
AGATGTGGTGATGAGTATGACTACTGATGACTTTGTAAAAATGTTTTCAGGTAAACTA 
AAACCAACAATGGCATTCATGTCAGGGAAATTGAAGATTAAAGGTAACATGGCCCTAG 
CAATCAAATTGGAGAAGCrAATGAATCAGATGAATGCCAGACTGTGAAGGAAAATATA 
AAAAAAAAGTCGACTGCTATGCTCAAAAAGTAAAAAAAGCTCT^CJVGTTAAAATCrAA 


TGTTTGTTTTCTTTC CTGTTATATTAT A 




ORF Start: ATG at 67 


ORF Stop: TGAat 1321 




SEQ ID NO: 242 


418 aa 


MW at 45394.2kD 


NOV86a, 

CG59628-01 Protein Sequence 


MLPNTGR1AGCTVFITGASRGIGKAIALKAAKDGANIVIAAKTAQPHPKLLGTIYTAA 
EE I EAVGGKALPC I VDVRDEQQI SAAVEKAI KKFGG I D I LVNNASAISLTNTLDT PTK 
RLDLMMNVNTRGT YLASKAC I PYLKKS KVAHI LNI S PPLNLN PVWFKQHCAYT I AKYG 
MSMYVLGMAEEFKGE I AVNALWPKTAI HTAAMDMLGG PGI E SQCRKVDI I ADAAYS I P 
QKPKSFTGNFVIDENILKEEGIENFDVYAIKPGHPLQPDFFLDEYPEAVSKKVESTGA 
VPEFKEEKLQLQPKPRSGAVEETFRIVKDSLSDDWKATQAIYLFELSGEDGGTWFLD 
LKSKGGNVGYGEPSDQADVVMSMTTDDFVKMFSGKLKPTMAFMSGKLKIKGNMALAIK 
LEKLMMQMNARL 



Further analysis of the NOV86a protein yielded the following properties shown in 
Table 86B. 



Table 86B. Protein Sequence Properties NOV86a 


PSort 

analysis: 1 


0.5500 probability located in endoplasmic reticulum (membrane); 0.5000 
probability located in microbody (peroxisome); 0.1900 probability located in 
lysosome (lumen); 0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV86a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 86C. 
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Table 86C. Geneseq Results for NOV86a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV86a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG81260 


Human AFP protein sequence SEQ 
ID NO:38 - Homo sapiens, 418 aa. 
[WO200129221-A2, 26-APR-2001] 


1..418 
1..418 


418/418(100%) 
418/418(100%) 


0.0 


AAB84367 


Amino acid sequence of human 
alcohol dehydrogenase 21612 - Homo 
sapiens, 418 aa. [WO200144446-A2, 
21-JUN-2001] 


1.418 
1..418 


418/418(100%) 
418/418(100%) • 


0.0 


AAG81258 


Human AFP protein sequence SEQ 
ID NO:34 - Homo sapiens, 383 aa. 
[WO200129221-A2, 26-APR-2001] 


1..382 
1..382 


382/382(100%) j 
382/382(100%) ; 


0.0 


ABB 10251 


Human cDNA SEQ ID NO: 559 - 
Homo sapiens, 278 aa. 
[WO200154474-A2, 02-AUG-2001] 


141..418 
1..278 


271/278(97%) 
274/278 (98%) \ 


e-156 


AAU23020 


Novel human enzyme polypeptide 
#106 - Homo sapiens, 278 aa. 
[WO200155301-A2, 02-AUG-2001] 


141..418 
1..278 


271/278(97%) I 
274/278(98%) 1 


e-156 



In a BLAST search of public sequence databases, the NOV86a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 86D. 



Table 86D. Public BLASTP Results for NOV86a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV86a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


CAC38510 


SEQUENCE 37 FROM PATENT 
WOO 129221 - Homo sapiens 
(Human), 418 aa. 


1..418 
1..418 


418/418(100%) 
418/418(100%) 


0.0 


CAC38508 


SEQUENCE 33 FROM PATENT 
WOO 129221 - Homo sapiens 
(Human), 383 aa. 


1..382 
1..382 


382/382 (100%) 
382/382 (100%) 


0.0 


Q99LV2 


HYPOTHETICAL 54.9 KDA 
PROTEIN - Mus musculus 
(Mouse), 496 aa. 


1..418 
1..496 


355/496 (71%) 
390/496 (78%) 


0.0 
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Q9BT58 


SIMILAR TO REKEN CDNA 
2610207116 GENE - Homo 
sapiens (Human), 345 aa. 


163..418 
90..345 


253/256 (98%) 
254/256 (98%) 


e-143 


Q9VB10 


CG5596 PROTEIN (GH01709P)- j 
Drosophila melanogaster (Fruit 
fly), 412 aa. 


4..418 
3..412 


238/422 (56%) 
300/422 (70%) 


e-128 



PFam analysis predicts that the NOV86a protein contains the domains shown in the 
Table 86E. 



Table 86E. Domain Analysis of NOV86a 


Pfam Domain 


NOV86a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


beta-lactamase: domain 1 of 
1 


222..236 


4/15(27%) 
14/15(93%) 


6.5 


adh_short: domain 1 of 1 


9..321 


74/339 (22%) 
211/339(62%) 


2.4e-29 


SCP2: domain 1 of 1 


306..415 


41/114(36%) 
87/114(76%) 


1.5e-25 



Example 87. 

The NOV87 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 87A. 



Table 87 A. NOV87 Sequence Analysis 




SEQ ID NO: 243 


888 bp 


NOV87a, 

CG595 16-01 DNA Sequence 


TTCAACAAGGGCCCCTCCTACAGGCTCTTGGCGGACGTCCAGAACAGGCTTCTGTTCA 
AATATGACTCCCAGAAGGAGGCAGAGCTCCGCAGCTGGATCAAGGGATTCACTGGCCT 
CTCCATCCGCCCCGACTTCCAGAAGGGCCTG AAGGACGGGATT ATTTTATG CACACT C 
GTGAACAAACTGCAGCCGGGCTCAGTCCCCAAGATCAACGGCTTCCGTGTAGAACTGG 
CACCAGCTAGAAAACCTCTCCAACATCCTCAAGGCAATGGTCAGCTACGGCATGATCC 
CGTGGACCTATTTGAGGCCAACGACCTGTTTGAGAGTGGGAACAATATGCAGGTGCGG 
GTGTCTCTTCTCGCCCTGG CAGGGAAGGCCAAG ACTAAGGGGCTG CAGAGCGGGGTGG 
ACATCCGTGACAAGTACTCAGAGAAGCAGAACTTCAACGACACCACCATGAAGGCCAG 
GCTGTGCGTCATC CGGCTG CAGATTACCAAC AAATGTGCC AGCCAGTCAGGCATGACC 
GCATACGTCACGAGGAGGCATCTCTACGACCCCAAGAACCGCATCCTGCCCCCCATGG 
ACAACTCGACCATCAGCCTCCGGATGGGTACAAACAAGTGCGCCAGCCAGGTGGGCAT 
GACGGCTCCCGGGAACCAGTGGCACATCTATGACACCAAGTTGGGAATCGACAAGTGT 
G AG AACT C CT C C ATGT C CC TG AAGATGGG CT ACACG CAGGT CG C C AATC ACAG CAG AC 
AGGTCTTTGGCCTAGGCCGGCAAATATATGAACCCAAGTACCAGCCGGGTGGCCCAGT 
GGCCCACGGGGCTCCCTCCGCCGGCAACTGCCCAGGGCCAGGGGAGGCCCCTTAGTAC 
CAGGAGGAGACCAGCTAC 




ORF Start: TTC at 1 


ORF Stop: TAG at 865 




SEQ ID NO: 244 


288 aa MW at 31831. 2kD 


NOV87a, 

CG59516-01 Protein Sequence 


FNKGPSYRLLADVQNRLLFKYDSQKEAELRSWIKGFTGLSIRPDFQKGLKDGIILCTL 
VNKLQPGSVPKINGFRVELAPARKPLQHPQGNGQLRHDPVDLFEANDLFESGNNMQVR 
VSLLAIiAGKAKTKGLQSGVDIRDKYSEKQNFNDTTMKAJlLCVIRLQITNKCASQSGMT 
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AYVTRRHLYDPKNRILPPMDNSTISLRMGTNKCASQVGMTAPGNQWHIYDTKLGIDKC 
ENSSMSLKMGYTQVANHSRQVFGIX3RQIYEPKYQPGGPVAHGAPSAGNCPGPGEAP 




SEQ ID NO: 245 


888 bp 


NOV87b, 

CG595 16-02 DNA Sequence 


TTCAACAAGGGCCCCTCCT ACAGGCTCTTGG CGGACGTCCAGAACAGGCTTCTGTT CA 
AATATG ACTCCCAGAAGGAGGCAGAGCTCCG CAGCTGGATCAAGGGATTCACTGG C CT 
CTCCATCCGCCCCGACTTCCAGAAGGGCCTGAAGGACGGGATTATTTTATGCACACTC 
GTGAACAAACTGCAGCCGGGCTCAGTCCCCAAGATCAACGGCTTCCGTGTAGAACTGG 
CACCAGCTAGAAAACCTCTCCAACATCCTCAAGGCAATGGTCAGCTACGGCATGATCC 
CGTGGACCTATTTGAGGCCAACGACCTGTTTGAGAGTGGGAACAATATGCAGGTGCGG 
GTGTCTCTTCTCGCCCTGGCAGGGAAGGCCAAGACTAAGGGGCTGCAGAGCGGGGTGG 
ACATCCGTGAC AAGTACTCAGAG AAGC AG AACTTCAACGACACC AC CATGAAGG CCAG 
GCTGTGCGTCATCCGG CTGCAGATTACCAACAAATGTG CCAGCCAGTCAGGCATG ACC 
GCATACGTCACGAGGAGGCATCTCTACGACCCCAAGAACCGCATCCTGCCCCCCATGG 
ACAACTCGACCATCAGCCTCCGGATGGGTACAAACAAGTGCGCCAGCCAGGTGGGCAT 
GACGGCTCCCGGGAACCAGTGGCACATCTATGACACCAAGTTGGGAATCGACAAGTGT 
GAGAACTCCTCCATGTCCCTGAAGATGGGCTACACGCAGGTCGCCAATCACAGCAGAC 
AGGTCTTTGGCCTAGGCCGGCAAATATATGAACCCAAGTACCAGCCGGGTGGCCCAGT 
GG CCCACGGGG CTCCCTC CG C CGG C AACTG C C CAGGG C CAGGGG AGG C C CCTT AGTAC 
CAGGAGGAGACCAGCTAC 




ORF Start: TTC at 1 


ORF Stop: TAG at 865 




SEQ ID NO: 246 


288 aa MW at 31831.2kD 


NOV87b, 

CG59516-02 Protein Sequence 


FNKG PSYRLIADVQNRLLFKYDSQKEAELRSWI KG FTGLS I RPDFQKGLKDG 1 1 LCTL 
WKIiQPGSVPKINGFRVElAPARKPLQHPQGNGQLRHDPVDLFEANDLFESGNNMQVR 
VSLLAIAGKAKTKGLQSGVDIRDKYSEKQNFTJDT^ 

AYVTRRHLYDPKNRILPPMDNSTISLRMGTNKCASQVGMTAPGNQWHIYDTKliGIDKC 
ENSSMSLKMGYTQVANHSRQVFGLGRQIYEPKYQPGGPVAHGAPSAGNCPGPGEAP 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 87B. 



Table 87B. Comparison of NOV87a against NOV87b. 


Protein Sequence 


NOV87a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV87b 


1..288 
1..288 


288/288 (100%) 
288/288 (100%) 



Further analysis of the NOV87a protein yielded the following properties shown in 
Table 87C. 



Table 87C. Protein Sequence Properties NOV87a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.21 10 probability located in lysosome (lumen); 0.1000 
probability located in mitochondrial matrix space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV87a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 87D. 
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Table 87D. Geneseq Results for NOV87a 


Geneseq 

THpniifipr 


Protein/Organism/Length [Patent #, 
Datel 


N0V87a 
Residues/ 

lVf 

Residues 


Identities/ 
Similarities for 

tli a IVfjitplipH 

LUC irXwHIICU 

Region 


Expect 

▼ aiuc 


AAR94888 


Carponin - Homo sapiens, 297 aa. 
[JP08073380-A, 19-MAR-1996] 


1..265 
6..272 


136/269 (50%) 
176/269 (64%) 


7e-63 


AAR72588 


Carponin protein - Homo sapiens, 297 
aa. [WO9509010-A, 06-APR-1995] 


1..265 
6..272 


136/269 (50%) 
176/269 (64%) 


7e-63 


AAB43807 


Human cancer associated protein 
sequence SEQ ID NO: 1252 - Homo 
sapiens, 163 aa. [WO200055350-A1, 
21-SEP-2000] 


164..273 
4..116 


67/113 (59%) 
82/113(72%) | 


6e-30 


AAM73074 


Human bone marrow expressed probe 
encoded protein SEQ ID NO: 33380 - 
Homo sapiens, 71 aa. [WO200 157276- 
A2, 09-AUG-2001] 


157.225 
2..71 


49/70 (70%) i 
55/70 (78%) i 


4e-21 


AAM60434 


Human brain expressed single exon 
probe encoded protein SEQ ID NO: 
32539 - Homo sapiens, 71 aa. 
[WO200157275-A2, 09-AUG-2001] 


157..225 
2..71 


49/70 (70%) j 
55/70 (78%) j 


4e-21 



In a BLAST search of public sequence databases, the NOV87a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 87E. 



Table 87E. Public BLASTP Results for NOV87a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV87a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q08094 


Calponin H2, smooth muscle - Sus 
scrofa (Pig), 296 aa (fragment). 


1..287 
6..296 


219/291 (75%) 
237/291 (81%) 


e-116 


Q99439 


Calponin H2, smooth muscle 
(Neutral calponin) - Homo sapiens 
(Human), 309 aa. 


1..288 
6..297 ! 


218/292 (74%) 
235/292 (79%) 


e-115 


Q08093 


Calponin H2, smooth muscle - Mus 
musculus (Mouse), 305 aa. 


1..288 
6..293 


214/291 (73%) 
231/291 (78%) 


e-112 


093547 


CALPONIN H3 - Xenopus laevis 
(African clawed frog), 295 aa. 


1..269 
5..276 


179/273 (65%) 
208/273 (75%) 


6e-91 


Q922F8 








8e-83 
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MGC:8135) - Mus musculus 


1.230 : 


. 179/233 (76%) 






(Mouse), 242 aa. 









PFam analysis predicts that the NOV87a protein contains the domains shown in the 
Table 87F. 



Table 87F. Domain Analysis of NOV87a 


Pfam Domain 


NOV87a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


CH: domain 1 of 1 


23.. 123 


27/124 (22%) 
65/124 (52%) 


0.068 


calponin: domain 1 of 2 


159.183 


17/26 (65%) 
21/26 (81%) 


3.8e-07 


calponin: domain 2 of 2 


198..223 


15/26 (58%) 
19/26 (73%) 


3e-08 



Example 88. 

The NOV88 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 8 8 A. 



Table 88A. NOV88 Sequence Analysis 




SEQ ID NO: 247 


2213 bp 


NOV88a, 

CG59671-02 DNA Sequence 


CGTGAGGCACCCACTCTGGGAG CACAGAGAGCTCAGGTAG CCTGCCTAGATGGCGGCG 


CGCACCCTGGGCCGCGGCGTCGGGAGGCTGCTGGGCAGCCTGCGAGGGCTCTCGGGGC 
AGCCCGCGCGGCCGCCGTGCGGGGTGAGCGCGCCGCGCAGGGCGGCCTCGGGACCCTC 
GGGCAGCGCTCCCGCAGTTGCAGCAGCAGCAGCACAGCCAGGCTCGTATCCCGCGCTG 
AGTGCACAGGCAGCCCGGGAGCCGGCCGCCTTCTGGGGGCCTCTGGCGCGGGACACTC 
TCGTGTGGGACACCCCCTACCACACCGTCTGGGACTGCGACTTCAGCACTGGCAAGAT 
CGGCTGGTTCCTGGGAGGCCAGTTAAATGTCTCTGTCAACTGCTTGGACCAGCATGTT 
CGGAAGTCCCCCGAGAGCGTTGCTTTGATCTGGGAGCGCGATGAGCCTGGAACGGAAG 
TGAGGATCACCTACAGGGAACTACTGGAGACCACGTGCCGCCTGGCCAACACGCTGAA 
GAGGCATGGAGTCCACCGTGGGGACCGTGTTGCCATCTACATGCCCGTGTCCCCATTG 
GCTGTGGCAGCAATGCTGGCCTGTGCCAGGATCGGAG CTGTCCACACAGTCAT CTTTG 
CTGGCTTCAGTGCAGAGTCCTTGGCTGGGAGGATCAATGATGCCAAGTGCAAGGTGGT 
TATCACCTTCAACCAAGGACTCCGGGGTGGGCGCGTGGTGGAGCTGAAGAAAATAGTG 
GATGAGGCTGTGAAGCACTGCCCCACCGTGCAGCATGTCCTGGTGGCTCACAGGACAG 
ACAACAAGGTCCACATGGGGGATCTGGACGTCCCG CTGGAG CAGGAAATGGC CAAGG A 
GGACC CTGTTTGCGCC CCAGAGAGCATGGGCAGTGAGGAC ATG CTCTTCATG CTGT AC 
ACCTCAGGGAGCACCGGAATGCCCAAGGGCATCGTCCATACCCAGGCAGGCTACCTGC 
TCTATG CCGCCCTGACTCACAAGCTTGTGTTTG AC CACCAGCCAGGTG ACATCTTTGG 
CTGTGTGGCCGACATCGGTTGGATTACAGGACACAGCTACGTGGTGTATGGGCCTCTC 
TGCAATGGTGCCACCAGCGTCCTTTTTGAGAGCACCCCAGTTTATCCCAATGCTGGTC 
GGTACTGGGAGACAGTAGAGAGGTTGAAGATCAATCAGTTCTATGGCGCCCCAACGGC 
TGTCCGGCTGTTGCTGAAATACGGTGATGCCTGGGTGAAGAAGTATGATCGCTCCTCC 
CTGCGG ACCCTGGGGTCAGTGGGAG AG CCCATC AACTGTG AGGCCTGGG AGTGG CTTC 
ACAGGGTGGTGGGGGACAGCAGGTGCACGCTGGTGGACACCTGGTGGCAGACAGAAAC 
AGGTGGCATCTGCATCGCACCACGGCCCTCGGAAGAAGGGGCGGAAATCCTCCCTGCC 
ATGGCGATGAGGCCCTTCTTTGGCATCGTCCCCGTCCTCATGGATGAGAAGGGCAGCG 
TCGlXjGAGGGCAGCAACGTCTCCGGGGCCCTGTGCATCTCCCAGGCCTGGCCGGGCAT 
GGCCAGGACCATCTATGGCGACCACCAGCGATTTGTGGACGCCTACTTCAAGGCCTAC 
CCAGGCTATTACTTCACTGGAGACGGGGCTTACCGAACTGAGGGCGGCTATTACCAGA 
TCIACAGGGCGGATGGATGATGTCATCAACATCAGTGGCCACCGGCTGGGGACCGCAGA 
GATTGAGGACGCCATCGCCGACCACCCTGCAGTACCAGAAAGTGCTGTCATTGGCTAC 
CCCCACGACATCAAAGGAGAAGCTGCCTTTGCCTTCATTGTGGTGAAAGATAGTGCGG 
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GTGACTCAGATGTGGTGGTGCAGGAGCTCAAGTCCATGGTGGCCACCAAGATCGCCAA 
ATATGCTGTGCCTGATGAGATCCTGGTGGTGAAACGTCTTCCAAAAACCAGGTCTGGG 
AAGGTCATGCGG CGGCTCCTG AGG AAG ATCATCACT AGTGAGGCCCAGG AG CTGGG AG 
ACACTACCACCTTGGAGGACCCCAGCATCATCGCAGAGATCCTGAGTGTCTACCAGAA 
GTGCAAGGACAAGCAGGCTGCTGCTAAGTOAGCTGGCACCTTGTGGGGCTCTTGGGAT 
GGGCGGGCACCCAAGCCCTGGCTTGTCCTTCCCAGAAGGTACCCCTGAGGTTGGCGTC 


TTCCTACGT 




ORF Start: ATG at 50 


ORF Stop: TGAat 2117 




SEQ ID NO: 248 


689 aa MW at 74855.9kD 


NOV88a, 

CG59671-02 Protein Sequence 


MAARTLGRGVGRLLGSLRGLSGQPARPPCGVSAPRRAASGPSGSAPAVAAAAAQPGSY 
PALSAQAAREPAAFWGPLAiU)TLVWDTPYHTVVroCDFSTGKIGWFIjGGQLNVSVNC^ 
QHVRKS PES VAL I WERDE PGT E VR IT YRE LLETTCRLANTLKRHG VHRGDR VA I YM P V 
SPLAVAAMLACARIGAVHTVIFAGFSAESLAGRINDAKCKWITFNQGLRGGRWELK 
KIVDEAVKHCPTVQHVLVAHRTDNKVHMGDLDVPLEQEMAKEDPVCAPESMGSEDMLF 
MLYTSGSTGMPKGIVHTQAGYLLYAALTHKLVFDHQPGDIFGCVADIGWITGHSYWY 
GPLCNGATSVLFESTPVYPNAGRYWETVERLKINQFYGAPTAVRLLLKYGDAWVKKYD 
RSSLRTLGSVGEPINCEAWEWLHRWGDSRCTLVDTWWQTETGGICIAPRPSEEGAEI 
LPAMAMRPFFGIVPVLMDEKGSVVEGSNVSGALCISQAWPGMARTIYGDHQRFVDAYF 
KAYPGYYFTGDGAYRTEGGYYQITGRMDDVINI SGHRLGTAEI EDAI ADHPAVPESAV 
IGYPHDI KGEAAFAFI WKDSAGDSDVWQELKSMVATKI AKYAVPDEI LWKRLPKT 
RSGKVMRRLLRKI ITS EAQE LGDTTTLED PS 1 1 AE I LSVYQKCKDKQAAAK 



Further analysis of the NOV88a protein yielded the following properties shown in 
Table 88B. 



Table 88B. Protein Sequence Properties NOV88a 


PSort 
analysis: 


0.6500 probability located in plasma membrane; 0.6000 probability located in 
nucleus; 0.4340 probability located in mitochondrial inner membrane; 0.3000 
probability located in Golgi body 


SignalP 
analysis: 


Likely cleavage. site between residues 23 and 24 



A search of the NOV88a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 88C. 



Table 88C. Geneseq Results for NO\ 88a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV88a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for \ 
the Matched 
Region 


Expect 
Value 
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AAU23058 j 


Novel human enzyme polypeptide 
#144 - Homo sapiens, 664 aa. 
[WO200155301-A2, 02-AUG-2001] 


26..689 
1..664 i 


663/664(99%) I 
663/664(99%) j 


0.0 


AAB34712 ! 


Human secreted protein encoded by 
DNA clone vo9 1 - Homo sapiens, 
518 aa. [WO200055375-A1, 21-SEP- 
2000] 


172..689 
1..518 ; 


518/518(100%) | 
518/518(100%) 1 


0.0 


AAU23050 | 


Novel human enzyme polypeptide 
#136 - Homo sapiens, 479 aa. 
[WO2001 55301 -A2, 02-AUG-2001] 


224..689 
18..479 


459/466(98%) 
461/466(98%) 


0.0 


ABB12253 | 


Human acetate-coA ligase 
homologue, SEQ ID NO:2623 - 
Homo sapiens, 446 aa. 
[WO200 1 57 1 88-A2, 09- AUG-2001 ] 


1..446 
1..446 


446/446(100%) ! 
446/446(100%) 


0.0 


AAR23968 ; 


facA gene product - Penicillium 
chrysogenum, 669 aa. [WO9207079- 
A, 30-APR-1992] 


58..684 
45. .669 


305/629(48%) j 
407/629(64%) i 


e-175 



In a BLAST search of public sequence databases, the NOV88a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 88D. 



Table 88D. Public BLASTP Results for NOV88a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV88a 
Residues/ 
Match 
Residues . 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q99NB1 


ACETYL-COA SYNTHETASE 2 - 
Mus musculus (Mouse), 682 aa. 


1..687 
1..680 


599/687 (87%) 
638/687(92%) 


0.0 


Q9BEA3 


ACETYL-COA SYNTHETASE 2 - 
Bos taurus (Bovine), 675 aa. 


1..689 
1..675 


575/689(83%) 
625/689 (90%) 


0.0 


Q9NUB1 


DJ568C1 1.3 (NOVEL AMP-BINDING 
ENZYME SIMILAR TO ACETYL- 
COENZYME A SYNTHETHASE 
(ACETATE-COA LIGASE)) - Homo 
sapiens (Human), 478 aa (fragment). 


212..689 
1..478 j 


478/478 (100%) 
478/478(100%) 


0.0 


Q96JI1 


KIAA1846 PROTEIN - Homo sapiens 
(Human), 354 aa (fragment). 


336..689 

1..354 : 


354/354 (100%) 
354/354(100%) 


0.0 


Q9HV66 


ACETYL-COENZYME A 
SYNTHETASE - Pseudomonas 
aeruginosa, 645 aa. 


58..675 ' 
24..639 


326/619 (52%) 
433/619 (69%) 


0.0 
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PFam analysis predicts that the NOV88a protein contains the domains shown in the 
Table 88E. 



Table 88E. Domain Analysis of NOV 88a 


Pfam Domain 


NOV88a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


AMP-binding: domain 1 of 
1 


142..580 


121/441 (27%) 
341/441 (77%) 


7.1e-117 



Example 89. 

The NOV89 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 89A. 



Table 89A. NOV89 Sequence Analysis 




SEQ ID NO: 249 


1268 bp 


NOV89a, 

CG56870-01 DNA Sequence 


ACTTCri"l'CI'l"l u i'CTGTTTCAGAGTTACTGATTTATTCTTGAGATTCCTCTACTCTCG 


TTATCTGACCTCATGGATGAACTTCAGGATGTTCAGCTCACAGAGATCAAACCACTTC 
TAAATGATAAGAATGGTACAAGAAACTTCCAGGACTTTGACTGTCAGGAACATGATAT 
AGAAACAACTCATGGTGTGGTCCACGTCACTATAAGAGGCTTACCCAAAGGAAACAGA 
C CAG TT AT ACT AACAT AT CATG ACATTGG C CTCAAC CG T AAAT C CTGTTT CAATG CAT 
TCTTTAACTTTGAGGATATGC^GAGATCACCCAGGACTTTCCTGTCTGTCATGTGGA 
TGCCCCAGGCC^GCAGGAAGGTGCACCCTCTTTCCCAACAGGGTATCAGTACCCCACA 
ATTOATGAGCTGGCTGAAATGCTCCCTCCTGT/TCTTACCCACCTAAGCCTGAAAAGCA 
TCATTGG AATTGGAGTTGG AGCTGGAGCTT ACATCCTCAG CAGATTTGCACTCAACCA 
T C CAG AG CTTGTGG AAGGCCTTGTGCT C ATT AATG TTG ACCCTTGC G CTAAAGG CTGG 
ATTGACTGGGCAGCTTCGAAACTCTCrrcGCCTGAGAA 

TGGCTCATCACTTTGGGCAGGAAGAGTTACAGGCCAACCTGGACCTGATCCAAACCTA 
CAGAATGCATATTCCCCAAGACATC^CCAAGAGAACCTGCAGCTCTTCTTCAATTCC 
T ACAATGGGCGCAGAG ACCTGG AGATCG AAAGAC CCATACTGGG CCAAAATGATAACA 
AATCAAAAACATTAAAGTGTTCTACTTTACTGGTGGTAGGGGACAAT^ 
TGAGGCTGTCGTCGAATGCAATTCCCGCCT<3AACCCTATAAATACAACTTTGCTAAAG 
ATGGCGGACTGTGGGGGACTGCCCCAGGTAGTTCAG CCTGGG AAGCTCAC CGAGGCCT 
TCAAGTACTTTTTGCAGGGAATGGGCT ACGTCCCGT CTG CCAGC ATGACTCGGCTCGC 
CCGAT<^CGAACCCACTCAACCTCGAGTAGCCTCGGCTCTGGAGAAAGTCCCTTCAGC 
CG GT CTGT CACCAGCAAT C AGTCAG ATGG AACTC AAG AAT C CTG TG AG TC CCCTG ATG 
TCCTGGACAGACACCAGACCATGGAGGTGTCCTGCTAAGCAGATGCTCCTCCCCTGGA 
CCATTG CAAGTC CATCCTTC AAATGACCACTCCATAAT AT AACATTTCAT 




ORF Start: ATG at 71 


ORF Stop: TAAat 1196 




SEQ ID NO: 250 


375 aa 


MWat41413.3kD 


NOV89a, 

CG56870-01 Protein Sequence 


MDELQDVQLTEI KPLLNDKNGTRNFQDFDCQEHDI ETTHGWHVTIRGLPKGNRPVIL 
TYHDIGLNRKSCFNAFFNFEDMQEITQHFAVCHVDAPGQQEGAPSFPTGYQYPTMDEL 
AEMLPPVltTHLSLKSIIGIGVGAGAYILSRFALNHPELVEGLVLINVDPCAKGWIDWA 
ASKLSGLTTNVVDIILAHHFGQEELQANLDLIQTYRMHIAQDINQDNLQLFLNSYNGR 
RDLE I ERP I LGQNDNKSKTLKCSTLLWGDNS PAVEAWECNSRLNP I NTTLLKMADC 
GGLPQWQPGKLTEAFKYFLCjGMGYVPSASMTRLARSRTHSTSSSLGSGESPFSRSVT 
SNQS DGTQ E S CE S PDVLDRHQTME VSC 




SEQ ID NO: 251 


1175 bp 


NOV89b, 

CG56870-02 DNA Sequence 


TCGTTATCTGACCTCATGGATGAACTTCAGGATGTTCAGCTCACAGAGATCAAACCAC 
TTCTAAATGATAAGAATGGTAC^GAAACTTCCAGGACTTTGACTGTCAGGAACATGA 
TATAGAAACAACTCATGGTGTGGTCCACGTCACTATAAGAGGCTTACCCAAAGGAAAC 
AGACCAGTTATACTAA(^TATCATGACATTGGCCTCAACCATAAATCCTGTTTCAATO 
CATTCTTTAACTTTGAGGATATGCAAGAGATCACCCAGCACTTTGCTGTCTGTCATGT 
GGATG CC CCAGGCCAGCAGGAAGGTGCACCCTCTTTC CCAACAGGGTATCAGTACCCC 
ACAATGGATGAGCTGGCTGAAGTGCTGCCTCCTGTTCTTACCCACCTAAGCCTGAAAA 
GCATCATTGGAATTGGAGTTGGAGCTGGAGCTTACATCCTC^^ 
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CCATCCAGAGCTTGTGGAAGGCCTTGTGCTCATTAATGTTGACCCTTGCGCTAAAGGC 
TGGATTGACTGGGCAGCTTCCAAACTCTCTGGCCTGACAACCAATGTTGTGGACATTA 
TTTTGGCTCATCACTTTGGGCAGGAAGAGTTACAGGCCAACCTGGACCTGATCCAAAC 
CTACAG AATGCAT ATTG CCCAAG ACATC AACCAAGACAACCTGCAGCTCTT CTTGAAT 
TCCTACAATGGACGCAGAGACCTGGAGATCGAAAGACCCATACTGGGCCAAAATGATA 
ACAAATC AAAAACATT AAAGTGTTCT ACTTTACTGGTGGTAGGGG ACAATT CG CCTGC 
AGTTGAGGCTGTGGTCGAATGCAATTCCCGCCTGAACCCTATAAATACAACTTTGCTA 
AAGATGG CGG ACTGTGGGGGACTGCCCCAGGTAGTTCAG CCTGGG AAG CTCACCGAGG 
CCTTCAAGT ACTTTTTGCAGGGAATGGGCTACAT ACCATCTG C C AGCATGACTCGGCT 
CGCCCGATCACGAACCCACTCAACCTCGAGTAGCCTCGGCTCTGGAGAAAGTCCCTTC 
AGCCGGTCTGTCACCAGCAATCAGTCAGATGGAACTCAAGAATCCTGTGAGTCCCCTG 
ATGTC CTGG ACAGACACCAGACCATGGAGGTGTCCTG CT AAGCAG ATGCTCCTCCCCT 
GGACCATTGCAAGTC 




ORF Start: ATGat 16 


ORF Stop:TAAat 1141 




SEQ ID NO: 252 


375 aa 


MW at41376.2kD 


NOV89b, 

CG56870-02 Protein Sequence 


MDELQDVQLTE IKPLLNDKNGTRNFQDFDCQEHDI ETTHGWHVTI RGLPKGNRPVI L 
TYHDIGLNHKS CFNAFFNFEDMQEITQHFAVCHVDAPGQQEGAP SF PTGYQYPTMDEL 
AEVLPPVLTHLSLKS I IG IGVGAGAYI LSRFALNHPELVEGLVLINVDPCAKGW I DWA 
AS KL SGLTTNWD 1 1 LAHHFGQE ELQANLDL I QTYRMH I AQD I NQDNLQL F LNS YNG R 
RDLEIERPI LGQNDNKSKTIiKCSTLLWGDNS PAVEAWECNSRLNPINTTLLKMADC 
GGLPQWQPGKLTEAFKYFLQGMGYI PSASMTRLARSRTHSTSSSLGSGES PFSRSVT 
SNQSDGTQE SCES PDVLDRHQTMEVSC 




SEQ ID NO: 253 


1232 bp 


NOV89c, 

CG56870-03 DNA Sequence 


ACTTCTTTCTTTT Cl'GTTTCAGAGTT ACTGATTT ATTCTTGAGATT CCTCTACTCTCG 


TTATCTGACCTCATGGATGAACTTCAGGATGTTCAGCTCACAGAGATCAAACCACTTC 
TAAATGATAAGGAACATGATATAGAAACAACTCATGGTGTGGTCCACGTCACTATAAG 
AGG CTT ACCCAAAGGAAACAGAC CAGTT ATACTAACATATCATGACATTGGCCTCAAC 
CATAAATCCTGTTTCAATGCATTCTTTAACTTTGAGGATATGCAAGAGATCACCCAGC 
ACTTTGCTGTCTGTCATGTGGATGCCCCAGGCCAGCAGGAAGGTGCACCCTCTTTCCC 
AACAG^TATCAGTACCCCACAATGGATGAGCTGGCTGAAATGCTGCCTCCTGTTCTT 
ACCCACCTAAGCCTGAAAAGCATCATTGGAATTGGAGTTGGAGCTGGAGCTTACATCC 
TCAGCAGATTTGCACTCAACCATCCAGAGCTTGTGGAAGGCCTTGTGCTCATTAATGT 
TGACCCTTGCGCTAAAGGCTGGATTGACTGGGCAGCTTCCAAACTCTCTGGCCTGACA 
ACCAATGTTGTGGACATTATTTTGGCTCATCACTTTGGGCAGGAAGAGTTACAGGCCA 
ACCTGGACCTG ATCCAAACCT ACAGAATG C AT ATTGCCCAAGACAT CAACCAAGACAA 
CCTGCAGCTCTTCTTGAATTCCTACAATGGGCGCAGAGACCTGGAGATCGAAAGACCC 
ATACTGGGCCAAAATGATAACAAATCAAAAACATTAAAGTGTTCTACITTACTGGTGG 
TAGGGGACAATTCGCCTGCAGTTGAGGCTGTGGTCGAATGCAATTCCCGCCTGAACCC 
TATAAATACAACTTTGCTAAAGATGGCGGACTGTGGGGGACTGCCCCAGGTAGTTCAG 
CCTGGGAAGCTCACCGAGGCCTTCAAGTACTTTTTGCAGGGAATGGGCTACGTCCCGT 
CTGCCAGCATGACTCGGCTCGCCCGATCACGAACCCACTCAACCTCGAGTAGCCTCGG 
CTCTGGAGAAAGTCCCTTCAGCCGGTCTGTCACCAGCAATCAGTCAGATGGAACTCAA 
G AATC CTGTGAGTCCCCTGATGTCCTGGACAG ACACCAG ACCATGGAGGTGTC CTGCT 
AAGCAGATGCTCCTCCCCTGGACCATTGCAAGTCCATCCTTCAAATGACCACTCCATA 


ATATAACATTTCAT 




ORF Start: ATG at 71 


ORF Stop: TAA at 1160 




SEQ ID NO: 254 


363 aa 


MW at 39967.8kD 


NOV89c, 

CG56870-03 Protein Sequence 


MDELQDVQLTEI KPLLNDKEHDI ETTHGWHVTI RGLPKGNRPVI LTYHDIGLNHKSC 
FNAFFNFEDMQEITQHFAVCHVDAPGQQEGAPSFPTGYQYPTMDELAEMliPPVLTHLS 
LKS 1 1 G I G VG AGAY I LSRFALNH P E L VEG L VLI NVD P CAKGW I DWAASKLSGLTTNW 
DI ILAHHFGQEELQANLDLIQTYRMHI AQDINQDNLQLFLNSYNGRRDLEI ERPI LGQ 
NDNKSKTLKCSTLLWGDNS PAVEAWECNS RLNPINTTLLKMADCGGLPQWQPGKL 
TEAFKYFLQGMGYVPSASMTRLARSRTHSTSSSLGSGESPFSRSVTSNQSDGTQESCE 
S PDVLDRHQTMEVSC 




SEQ ID NO: 255 


1220 bp 


NOV89d, 

CG56870-04 DNA Sequence 


ACTTCTTTCTTTTCTGTTTCAGAGTTACTGATTTATTCTTGAGATTCCTCTACTCTCG 


TTATCTGACCTCATGGATGAACTTCAGGATGTTCAGCTCACAGAGATCAAACCACTTC 
TAAATGATAAGAATGGTACAAGAAACTTCCAGGACTTTGACTGTCAGGAACATGATAT 
AGAAACAACTCATGGTGTGGTCCACGTCACTATAAGAGGCTTACCCAAAGGAAACAGA 
CCAGTT ATACT AACATATCATGAC ATTGGCC TCAACCGTAAAT CCTGTTTCAATGCAT 
TCTTT AACTTTGAGGATATGCAAG AG AT CACCC AG CACTTTGCTGTCTGTCATGTGG A 
TGCCCCAGGCCAGCAGGAAGGTGCACCCTCTTTCCCAACAGGGTATCAGTACCCCACA 
ATGG ATG AG CTGG CTG AAATG CTG C CTC CTG TT CTT ACC CAC CT AAG CCTG AAAAG C A 
TCATTGGAATTGGAGTTGGAGCTGGAGCTTACATCCTCAGCAGATTTGCACTCAACCA 
TCCAGAGCTTGTGGAAGGCCTTGTGCTCATTAATGTTGACCCTTGCGCTAAAGGCTGG 
ATTGACTGGGCAGCTTCCAAACTCTCTGGCCTGACAACCAATGTT'GTGGACATTATTT 
TGGCTCATCACTTTGGGCAGGAAGAGTTACAGGCCAACCTGGACCTGATCCAAACCTA 
C AG AATG CAT ATTG C CC AAG ACATC AACCAAG A C AAC CTG CAGCT CTTCTTG AATT C C 
TACAATGGACGCAGAGACCTGGAGATCGAAAGACCCATACTGGGCCAAAATGATAACA 
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AATCAAAAACATTAAAGTGTTCTACTTTACTGGTGGTAGGGGACAAtTCGCCTGCAGT 
TGAGGCTGTGATGGCGGACTGTGGGGGACTGCCCCAGGTAGTTCAGCCTGGGAAGTTC 
ACCGAGGCCTTCAAGTACTTTTTGCAGGGAATGGGCTACACACCATCTGCCAGCATGA 
CTCXMCTCGCCCGATCACGAACCCACTCAACCTCGAGTAGCCTCGGCTCTGGAGAAAG 
TCCCTTCAGCCGGT CTGTCACCAGCAATCAGTCAG ATGGAACTCAAGAAT CCTGTG AG 
TCCCCTGATGTCCTGGACAGACACCAGACCATGGAGGTGTCCTGCTAAGCAGATGCTC 
CTCCCCTGGACCATTGCAAGTCCATCCTTCAAATGACCACTCCATAATATAACATTTC 


AT 




ORF Start: ATG at 71 


ORF Stop:TAAat 1148 




SEQ ID NO: 256 


359 aa |MW at 39652.2kD 


NOV89d 

CG56870-04 Protein Sequence 


r©EIiQDVQLTEIKPLLNDKNGTWJFQDPDCQEHDIETTHGVVHVTIRGLPKGNRPVIL 
TYHD IGLNRKSCFNAFFNFEDMQE ITQHFAVCHVDAPGQQEGAPSFPTGYQYPTMDEL 
AEMLPPVLTHLSLKSIIGIGVGAGAYILSRFALNHPELVEGLVLINVDPCAKGWIDWA 
ASKLSGLTTNVVDIILAHHFGQEELQANLDLIQTYRMHIAQDINQDNLQLFLNSYNGR 
RDLE I ERP ILGQNDNKSKTLKCSTLL WGDNS PAVEAVMADCGGLPQWQPGKFTEAF 
KYFLQGMGYTPS ASMTRIiARSRTHSTS S SLGSGES PFSRS VTSNQSDGTQESCESPDV 
LDRHQTMEVSC 




SEQ ID NO: 257 


970 bp 


NOV89e, 

CG56870-05 DNA Sequence 


ATGG ATG AAC TTCAGGATG TTC AGCTCAC AG AG AT CAAAC CACTTCT AAATG AT AAG A 
ATGGTAC AAGAAACTTCCAGG ACTTTGACTGTCAGTATCAGTAC CC CACAATGG ATGA 
GCTGGCTGAAATGCTGCCTCCTGTTCTTACCCACCTAAGCCTGAAAAGCATCATTGGA 
ATTGGAGTTGGAGCTGG AGCTT ACATCCTCAG CAG ATTTGCACT CAACCATCCAGAGC 
TTGTGGAAGGCCTTGTGCTCATTAATGTTGACCCTTGCGCTAAAGGCTGGATTGACTG 
GGCAGCTTCCAAACTCTCTGGCCTGACAACCAATGTTGTGGACATTATTTTGGCTCAT 
CACTTTGGGCAGGAAGAGTTACAGGCCAACCTGGACCTGATCCAAACCTACAGAATGC 
ATATTGCCCAAGACATCAACCAAGACAACCTGCAGCTCTTCTTGAATTCCTACAATGG 
GCGCAGAGACCTGGAGATCGAAAGACCCATACTGGGCCAAAATGATAACAAATCAAAA 
ACATTAAAGTGTTCTACTTTACTGGTGGTAGGGGACAATTCGCCTGCAGTTGAGGCTG 
TGGTCGAATGCAATTCCCGCCTGAACCCTATAAATACAACTTTGCTAAAGATGGCGGA 
CTGTGGGGGACTGCCCCAGGTAGTTCAGCCTGGGAAGCTCACCGAGGCCTTCAAGTAC 
TTTrTGCAGGGAATGGGCTACGTCCCGTCTGCCAGCATGACTCGGCTCGCCCGATCAC 
GAACCCACTCAACCTCGAGTAGCCTCGGCTCTGGAGAAAGTCCCTTCAGCCGGTCTGT 
CACCAGCAATCAGTCAGATGGAACTCAAGAATCCTGTGAGTCCCCTGATGTCCTGGAC 
AGACACCAGACCATGGAGGTGTCCTGCTAAGCAGATGCTCCTCCCCTGGACCATTGCA 
AGTCCATCCTTCAAATGACCACTCCATAATATAACATTTCAT 




ORF Start: ATG at 1 


ORF Stop: TAA at 898 




SEQ ID NO: 258 


299 aa MW at 32956.9kD 


NOV89e, 

CG56870-05 Protein Sequence 


MDELQD VQ LT E I KPLLNDKNGT RNFQDFDCQ YQ Y PTMD E LAEMLP P VLTH LS LKS 1 1 G 
IGVGAGAYILSRFALNHPELVEGLVLINVDPCAKGWIDWAASKLSGLTTNVVDIILAH 
HFGQEELQANLDLIQTYRMHIAQDINQDNLQLFLNSYNGRRDLEIERPILGQNDNKSK 
TLKCSTLLWGDNSPAVEAWECNSRLNPINTTLLKMADCGGLPQWQPGKLTEAFKY 
FLQGMGYVPSASMTRLARSRTHSTSSSLGSGESPFSPLSVTSNQSDGTQESCESPDVLD 
RHQTMEVSC 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 89B. 



Table 89B. Comparison of NOV89a against NOV89b through NOV89e. 


Protein Sequence 


NOV89a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV89b 


1..375 
1..375 


336/375 (89%) 
338/375 (89%) 


NOV89c 


1..375 
1..363 


326/375 (86%) 
326/375 (86%) 
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NOV89d 


1..375 j 


321/375 (85%) 




1..359 


321/375 (85%) 


NOV89e 


104..375 


233/272 (85%) 




28..299 


233/272 (85%) 



Further analysis of the NOV89a protein yielded the following properties shown in 
Table 89C. 



Table 89C. Protein Sequence Properties NOV89a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1685 probability located in lysosome (lumen); 0.1000 
probability located in mitochondrial matrix space 


SignalP i 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV89a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 89D. 



Table 89D. Geneseq Results for NOV89a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV89a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for \ 
the Matched 
Region 


Expect 
Value 


AAM94019 


Human stomach cancer expressed 
polypeptide SEQ ID NO 108 - Homo 
sapiens, 363 aa. [WO200109317-A1, 
08-FEB-2001] 


1..375 
1..363 


360/375(96%) I 
361/375 (96%) | 


0.0 


AAG64392 


Human reducing agent and 
tunicamycin-responsive protein 40 - 
Homo sapiens, 363 aa. 
[WO200155375-A1, 02-AUG-2001] 


1..375 
1..363 


360/375 (96%) i 
361/375 (96%) j 


0.0 


AAB94494 


Human protein sequence SEQ ID 
NO: 15186 - Homo sapiens, 363 aa. 
[EP1074617-A2, 07-FEB-2001] 


1..375 
1..363 


360/375 (96%) 
361/375 (96%) 


0.0 


AAU31598 


Novel human secreted protein #2089 - 
Homo sapiens, 395 aa. 
[WO200179449-A2, 25-OCT-2001] 


68..374 
1..307 


282/323 (87%) I 
286/323 (88%) \ 


e-154 


AAB95462 


Human protein sequence SEQ ID 


133..375 
44..286 


240/243 (98%) 
242/243 (98%) j 


e-138 
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[EP1074617-A2, 07-FEB-2001] 







In a BLAST search of public sequence databases, the NOV89a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 89E. 



Table 89E. Public BLASTP Results for NOV89a 


Protein 
Accession 
Number 


Protein/Organism/Length 


Residues/ 
Match 
Residues 


lUcDllllcS/ 

Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9UGV2 


NDRG3 protein - Homo sapiens 
(Human), 375 aa. 


1..375 
1..375 


373/375 (99%) 
374/375 (99%) 


0.0 


Q96PL8 


NDR1 -RELATED DEVELOPMENT 
PROTEIN NDR3 - Homo sapiens 

/'HllTYiatl^ J15I 


1..375 
1..375 


372/375 (99%) 
373/375 (99%) 


0.0 


Q9QYF9 


NDRG3 protein (Ndr3 protein) - Mus 
musculus (Mouse), 375 aa. 


1..375 
1..375 


358/375 (95%) 
368/375 (97%) 


0.0 


AAH18504 


SIMILAR TO N-MYC 
DOWNSTREAM REGULATED 3 - 
Mus musculus (Mouse), 388 aa. 


1..375 
1..388 


359/388 (92%) 
368/388 (94%) 


0.0 


Q96SM2 


CDNA FU 1 4759 FIS, CLONE 
NT2RP3003290, MODERATELY 
SIMILAR TO MUS MUSCULUS 
NDR1 RELATED PROTEIN NDR3 - j 
Homo sapiens (Human), 363 aa. \ 


1..375 
l.:363 


360/375 (96%) 
361/375 (96%) 


0.0 



PFam analysis predicts that the NOV89a protein contains the domains shown in the 
Table 89F. 



Table 89F. Domain Analysis of NOV89a 


Pfam Domain 


NOV89a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


Orn Arg deC N: domain 1 
of 1 


62..89 i 


7/33 (21%) 
24/33 (73%) 


1.9 


abhydrolase: domain 1 of 1 


87.310 




0.0066 
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142/239 (59%) 




Ndr: domain 1 of 1 


22.346 


210/340(62%) 
311/340(91%) 


3.7e-211 



Example 90. 

The NOV90 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 90A. 



Table 90A. NOV90 Sequence Analysis 




SEQ ID NO: 259 


632 bp 


NOV90a, 

CG59764-01 DNA Sequence 


GAAACTATAAAGGGTCCGAACCCTCTTTTAAAGGATCCCAATGCATTTCTTTGATCCC 


TCGCCGGTGCGACGGTACCACCATCCCAGCTGTGAGGCTGCCATCAACACCCACATCA 
GCCTGG AGCTCCACG CATCCTATGTGTACCTGTC CATGGCCTTCTACTTCG ACC AGGA 
CGACGCGGCCCTGGAGCACTTTGACCGCTACTTCCTGCGCCAGTCGCAGGAGAAAAGG 
GAGCACGCCCAGGAGCTGATGAGCCTGCAGAACCTGCGCGGTGGCCGCATCTGCCTTC 
ATGACATCAGGAAGCCAGAGGGCCAAGGCTGGGAGAGCGGGCTCAAGGCCATGGAGTG 
CACCTTCCACCTGGAGAAGAACATCAACCAGAGCCTCCTGGAGCTGCACCAGCTGGCC 
AGGGAC3AACGGCGACCCCCAGCTCTGCGACTTCCTGGAGAACGACTTCCTGAACCAGC 
AGGCCAAGACCATCAAAGAGCTGGGTGGCTACCTGAGCAACCTGCACAAGATGGGGGC 
CCCGGAAGCAGG CCTGGCAGAGTACCTCTTTAACAAGCTCAC CCTGGGCCGCAGCGAA 
CCACTTCCTTGAACCAGCAGGCCAAGACCATCAAAGAGATTGGTGGCTACCT 




ORF Start: ATG at 41 


ORF Stop: TGA at 590 




SEQ ID NO: 260 


183 aa MWat21159.6kD 


NOV90a, 

CG59764-01 Protein Sequence 


MH FFD PS PVRRYHHPS CEAAI NTH I S LE LHAS YVYLSMAFYFDQDD AALEHFDRYF LR 
QSQEKREHAQELMSLQNLRGGRI CLHDI RKPEGQGWESGLKAMECTFHLEKNINQSLL 
ELHQLARENGD PQLCD FLENDFLNQQ AKT I KELGG YLS NLHKMG A PEAGLAE YLFN KL 
TLGRSEPLP 



Further analysis of the NOV90a protein yielded the following properties shown in 
Table 90B. 



Table 90B. Protein Sequence Properties NOV90a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.1400 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV90a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 90C. 



Table 90C. Geneseq Results for NOV90a 


Geneseq 
Identifier 


Protein/Organ ism/Length [Patent #, 
Date] 


NOV90a 


Identities/ 


Expect 
Value 
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Match 
Residues 


the Matched 
Repinn 




AAU07889 


Polypeptide sequence for human 
hspG34a - Homo sapiens, 22 1 aa. 
TWO200166752-A2 13-SEP-20011 


7..180 
45..218 


159/174(91%) 
164/174(93%) 


4e-91 


AAU07890 


Polypeptide sequence for human 
hspG34b - Homo sapiens, 183 aa. 
[WO200166752-A2, 13-SEP-2001] 


6..177 
6..177 


125/172(72%) 
149/172 (85%) 


6e-70 


AAB90804 


Human shear stress-response protein 
SEQ ID NO: 108 - Homo sapiens, 183 
aa. [WO200125427-A1, 12-APR- 
2001] 


7..180 
7..180 


114/174(65%) 
141/174(80%) 


6e-64 


AAR71567 


Human monocyte growth factor - 
Homo sapiens, 183 aa. [JP07031482- 
A,03-FEB-1995] 


7..180 
7..180 


114/174(65%) 
141/174 (80%) 


6e-64 


AAU27741 


Mouse full-length polypeptide 
sequence #66 - Mus musculus, 1 82 aa. 
[WO2001 64834- A2, 07-SEP-2001] 


6..180 
6..180 


112/175(64%) 
141/175 (80%) 


5e-63 



In a BLAST search of public sequence databases, the NOV90a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 90D. 



Table 90D. Public BLASTP Results for NOV90a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV90a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9BXU8 


Ferritin heavy polypeptide-like 17 - 
Homo sapiens (Human), 183 aa. 


6..177 
6..177 


125/172(72%) 
149/172 (85%) 


2e-69 


P29389 


Ferritin heavy chain (Ferritin H 
subunit) - Cricetulus griseus 
(Chinese hamster), 185 aa. 


6..180 
10..184 


115/175 (65%) 
142/175 (80%) 


6e-64 


A26886 


ferritin heavy chain - chicken, 180 
aa. 


6..180 
5..179 


112/175 (64%) 
142/175 (81%) 


le-63 


P08267 


Ferritin heavy chain (F erritin H 
subunit) - Gallus gallus (Chicken), 
179 aa. 


6..180 
4..178 


112/175 (64%) 
142/175 (81%) 


le-63 


Q95MP7 


FERRITIN - Canis familiaris (Dog), 
183 aa. 


6.. 180 
6.. 180 


112/175 (64%) 
143/175 (81%) 


2e-63 
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PFam analysis predicts that the NOV90a protein contains the domains shown in the 
Table 90E. 



Table 90E. Domain Analysis of NOV90a 


Pfam Domain 


NOV90a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


Bacteriofer: domain 1 of 
1 


14..159 


35/172 (20%) 
76/172 (44%) 


6.7 


ferritin: domain 1 of 1 


17..173 


92/161 (57%) 
138/161 (86%) 


4.7e-87 



Example 91. 

The NOV91 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 91 A. 



Table 91A. NOV91 Sequence Analysis 




SEQIDNO: 261 


487 bp 


NOV91a, 

CG59710-01 DNA Sequence 


TGCTGTGCGTTGTCTTTCCTTCTCACTCAAGCCTGTGAAATCTCTCTTTCAGGTTGAC 


AGACTAATGGAGTTCCATTTTAAATATCTGGGTGCAATGCAGGTGGCGGACAAGAAGA 


TTGAAGGGGAAAAACACGACATGGTCCGGCGAGGAGAGATCATCGACAATGACACCGA 
GGAGGAGTTCTACCTCCGGCGCCTGGATGCGGGGCTCTTTGTTCTCCAGCACATCTGC 
TACATCATGGCCGAGATCTGCAATGCCAATGTCCCCCAGATTCGCCAGAGGGTTCACC 
AGATCCTAAACATGCGAGGAAGCTCCATCAAAATTGTCAGGCATATCATCAAGGAGTA 
TGCAGAGAACATCGGGGACGGCCGGAGCCCGGAGTTCCGGGAGAACGAGCAAAAGCGC 
ATCCTGGGCTTGCTGGAGAACTTCTAGAGGCACCTTGGCCCTGCGCATCATGGACTCT 




CTCAGCTTCCCTCCCAGGATCAG 




ORF Start: ATG at 65 


ORF Stop: TAG at 431 




SEQ ID NO: 262 


122 aa 


MW at 14385.4kD 


NOV91a, 

CG597 10-01 Protein Sequence 


MELHFKYUGAMQVADKKIEGEKHDMVRRGEIIDNDTEEEFYLRRLDAGLFVLQHICYI 
MAEICNANVPQIRQRVHQILNMRGSSIKIVRHIIKEYAENIGDGRSPEFRENEQKRIL 
GLLENF 



Further analysis of the NOV91a protein yielded the following properties shown in 
Table 9 IB. 



Table 91B. Protein Sequence Properties NOV91a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.1000 probability located in 
mitochondrial matrix space; 0.1000 probability located in lysosome (lumen); 
0.0000 probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV91a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 91C. 



Table 91C. Geneseq Results for NOV91a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV91a 
Residues/ 
Match 

U pel Hii pc 


Identities/ 
Similarities for 
the Matched 

R poinn 

IVvclUU 


Expect 
Value 


AAU28058 ! 


Novel human secretory protein, Seq 
ED No 227 - Homo sapiens, 5 1 8 aa. 
TWO200166689-A2 13-SEP-20011 


1..122 
397..518 


122/122(100%) i 
122/122(100%) 


le-66 


AAM93729 I 


Human polypeptide, SEQ ID NO: 
3689 - Homo sapiens, 563 aa. 
[EP1130094-A2, 05-SEP-2001] 


1..122 
442..563 


122/122(100%) 
122/122(100%) 


le-66 


AAB63116 j 


Human secreted protein sequence 
encoded by gene 39 SEQ ID NO: 126 
- Homo sapiens, 401 aa. 
[WO200061748-A1, 19-OCT-2000] 


1..119 
283.. 401 


119/119(100%) 
119/119(100%) 


le-64 


AAU28246 : 


Novel human secretory protein, Seq 
ID No 603 - Homo sapiens, 360 aa. 
[WO200166689-A2, 13-SEP-2001] 


1..118 
197..316 


104/120(86%) s 
106/120 (87%) 


2e-51 


ABB21673 


Protein #3672 encoded by probe for 
measuring heart cell gene expression 
- Homo sapiens, 32 aa. 
[WO200157274-A2, 09-AUG-2001] 


24..55 
1..32 


32/32 (100%) 
32/32 (100%) 


le-11 



In a BLAST search of public sequence databases, the NOV9 la protein was found to 
have homology to the proteins shown in the BLASTP data in Table 9 ID. 
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Table 91D. Public BLASTP Results for NOV91a 


Accession 
Number 


Protein/Organism/Length 


NOV91a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96KD2 \ 


TESTES DEVELOPMENT- 
RELATED NYD-SP19 - Homo 


1..122 
255.376 


122/122 (100%) 
122/122 (100%) 


5e-66 


Q9H7A5 


CDNA: FU21 108 FIS, CLONE 
CAS05257 - Hnmn ^anipns 

(Human), 225 aa. 


1..122 
104 22S 


121/122 (99%) 


5e-65 


062703 ! 


P14 - Bos taurus (Bovine), 122 aa. 


1..122 
1..122 


116/122(95%) 
119/122 (97%) 


2e-62 


Q9CWL8 


5730471K09RIK PROTEIN - Mus 
musculus (Mouse), 563 aa. 


1..122 
442..563 


115/122(94%) 
118/122(96%) \ 


3e-62 


Q9Y3M7 


DJ633O20. 1 (P14L, SIMILAR TO 
BOS TAURUS PI 4) - Homo sapiens 
(Human), 284 aa (fragment). 


1..93 
192..284 


93/93 (100%) 
93/93 (100%) 


3e-48 



PFam analysis predicts that the NOV9 la protein contains the domains shown in the 
Table 91E. 



Table SUE. Domain Analysis of NOV91a 



Pfam Domain 



NOV91a Match Region 



Identities/ 
Similarities 
for the Matched Region 



Expect Value 



No Significant Matches Found 



Example 92. 

The NOV92 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 92A. 



Table 92A. NOV92 Sequence Analysis 




SEQ ID NO: 263 |6527 bp 


NOV92a, 

CG59754-02 DNA Sequence 


CCACAGAGGGGAAATGCCAGCTTCCCTCTCCCTGGGGCTCCGTGCCCCCTCTGATCCA 


GCCCTTCGAATTCCCACCCGCCTCCATCGGCCAGCTGCTCTACATTCCCTGTGTGGTG 


TCCTCGGGGGACATGCCCATCCGTATCACCTGGAGGAAGGACGGACAGGTGATCATCT 
CAGGCTCGGGCGTGACCAT CGAGAG CAAGG AATTCATGAGCTCCCTGCAGATCTCT AG 
CGTCTCCCTCAAGCACAACGGCAACTATACATGCATCGCCAGCAACGCAGCCGCCACC 
GTGAGCATTGTGTCTCCAGAACACAGGTTTTTTATTACCTACCACGGCGGGCTGTACA 
TCTCTGACGTACAGAAGGAGGACGCCCTCTCCACCTATCGCTGCATCACCAAGCACAA 
GTATAGCGGGGAGACCCGGCAGAGCAATGGGGCACGCCTCTCTGTGACAGACCCTGCT 
GAGTCGATCCCCACCATCCTGGATGGCTTCCACTCCCAGGAAGTGTGGGCCGGCCACA 
CCGTGGAGCTGCCCTGCACCGCCTCGGGCTACCCTATCCCCGCCATCCGCTGGCTCAA 
GGATGGCCGGCCCCTCCCGGCTGACAGCCGCTGGACCAAGCGCATCACAGGGCTGACC 
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ATCAGCGACTTGCGGACCGAGGACAGCGGCACCTACATTTGTGAGGTCACCAACACCT 
TCGGTTCGGCAGAGGCCACAGGCATCCTCATGGTCATTGATCCCCTTCATGTGACCCT 
GACACCAAAGAAGCTGAAGACCGGCATTGGCAGCACGGTCATCCTCTCCTGTGCCCTG 
ACGGGCTCCCCAGAGTTCACCATCCGCTGGTATCGCAACACGGAGCTGGTGCTGCCTG 
ACGAGGCCATCTCCATCCGCGGGCTCAGCAACGAGACGCTGCTCATCACCTCGGCCCA 
GAAGAGCCATTCCGGGGCCTACCAGTGCTTCGCTACCCGCAAGGCCCAGACCGCCCAG 
GACTTTGCCATCATTGCACTTGAGGATGGCACGCCCCGCATCGTCTCGTCCTTCAGCG 
AGAAGGTGGTCAACCCCGGGGAGCAGTTCTCACTGATGTGTGCGGCCAAGGGCGCCCC 
GCCCCCCACGGTCACCTGGGCCCTCGACGATGAGCCCATCGTGCGGGATGGCAGCCAC 
CGCACCAACCAGTACACCATGTCGGACGGCACCACCATCAGCCACATGAACGTCACAG 
GCCCCCAGATCCGCGACGGGGGCGTGTACCGGTGCACAGCGCGGAACTTGGTGGGCAG 
TGCTGAATATCAGGCGCGAATAAACGTAAGAGGCCCACCCAGCATCCGGGCTATGCGG 
AACATCACAGCAGTCGCCGGGCGGGACACCCTTATCAACTGCAGGGTCATCGGCTATC 
CCTACTACTCCATCAAGTGGTACAAGGATGCCTTGCTGCTGCCAGACAACCACCGCCA 
GGTGGTGTTTGAGAATGGGACCCTCAAGCTGACTGACGTGCAGAAGGGCATGGATGAG 
GGGGAGTACCTGTGCAGTGTCCTCATCCAGCCCCAGCTCTCCATCAGCCAGAGCGTTC 
ACGTAGCCGTCAAAGTGCCCCCTCTGATCCAGCCCTTCGAATTCCCACCCGCCTCCAT 
CGGCCAGCTGCTCTACATTCCCTGTGTGGTGTCCTCGGGGGACATGCCCATCCGTATC 
ACCTGGAGGAAGGACGGACAGGTGATCATCTCAGGCTCGGGCGTGACCATCGAGAGCA 
AGGAATTCATGAGCTCCCTGCAGATCTCTAGCGTCTCCCTCAAGCACAACGGCAACTA 
TACATGCATCGCCAGCAACGCAGCCGCCACCGTGAGCCGGGAGCGTCAGCTCATCGTG 
CGTGTGCCCCCTCGATTTGTGGTGCAACCCAACAACCAGGATGGCATCTACGGCAAAG 
CTGGTGTGCTCAACTGCTCGGTGGACGGCTACCCCCCACCCAAGGTCATGTGGAAGCA 
TGCCAAGGGGAGCGGGAACCCCCAGCAGTACCACCCTGTGCCCCTCACTGGCCGCATC 
CAGATCCTG CCCAACAGCTCGCTGCTG ATC CGCCACGTCCT AGAAG AGGACATCGG CT 
ACTACCTCTG CCAGGCCAGCAACGGCGTAGGCACCGACAT CAGCAAGTCCATGTT CCT 
CACAGTCAAGATCCCGGCCATGATCACTTCCCACCCCAACACCACCATCGCCATCAAG 
GGCCATGCX3AAGGAGCTAAACTGCACGGCACGGGGTGAGCGGCCCATCATCATCCGCT 
GGGAG AAGGGGGACACAGTCATCGACCCTGACCGCGTCATG CGGTATG CCAT CG CCAC 
CAAGGACAACGGCGACGAGGTCGTCTCCACACTGAAGCTCAAGCCCGCTGACCGTGGG 
GACTCTGTGTTCTTCAGCTGCCATGCCATCAACTCGTATGGGGAGGACCGGGGCTTGA 
TCCAACTCACTGTGCAAGAGCCCCCCGACCCCCCAGAGCTGGAGATCCGGGAGGTGAA 
GGCCCGGAGCATGAACCTGCGCTGGACCCAGCGATTCGACGGGAACAGCATCATCACG 
GGCTTCGACATTGAATACAAGAACAAATCAGATTCCTGGGACTTCAAGCAGTCCACAC 
GCAACATCTCCCCCACCATCAACCAGGCCAACATTGTGGACTTGCACCCGGCATCTGT 
GT ACAG CAT C CG CATG TACT CTTT C AACAAG ATTGGC CGCAG TG AACCAAG CAAGG AG 
CTCACCATCAGCACTGAGGAGGCCGCTCCCGATGGGCCCCCCATGGATGTTACCTTGC 
AGCCAGTGACCTCACAGAGCATCCAGGTGACCTGGAAGGCACCCAAGAAGGAGCTGCA 
GAACGGTGTCATCCGGGGCTACCAGATTGGCTACAGAGAGAACAGCCCCGGCAGCAAC 
GGGCAGTACAG CATCGTGG AGATGAAGGCCACGGGGGACAGCGAGGTCTAC ACC CTGG 
ACAACCTCAAGAAGTTCGCCCAGTATXKGGTGGTGGTCCAAGCCTTCAATCGGGCTGG 
CACGGGGCCCTCTTCCAGCGAGATCAATGCCACCACTCTGGAGGATGTGCCCAGCCAG 
CCCCCTGAGAACGTCCGGGCCCTGTCCATCACTTCTGACGTGGCCGTCATCTCCTGGT 
CAGAGCCCCCGCGCAGCACCCTCAATGGCGTCCTCAAAGGCTATCGGGTCATCTTCTG 
GTCCCTCT ATGTTG ATGGGGAGTGGGG CGAGATGCAG AACAT CACCACCACGCGGG AG 
CCGGTGGAGCTGCGGGGCATGGAGAAGTTCACCAACTACAGCGTCCAGGTGCTGGCCT 
ACACCCAGGCTGGGGACGGCGTACGCAGCAGTGTGCTCTACATCCAGACCAAGGAGGA 
CGTTCCAGGTCCCCCTCCTGGCATCAAAGCTGTCCCTTCATCAGCTAGCAGTGTGGTT 
GTGTCTTGGCTCCCCCCTACCAAGCCCAACGGGGTGATCCGCAAGTACACCATCTTCT 
GTTCCAGCCCCGGGTCTGGCCAGCCGGCTCCCAGCGAGTACGAGACGAGTCCAGAGCA 
GCTCTTCTACCGGATCGCCCACCTAAACCGCGGTCAGCAGTATCTGCTGTGGGTGGCC 
GCCGTCACCTCTGCCGGCCGGGGCAACAGCAGCGAGAAGGTGACCATCGAGCCTGCTG 
GCAAGGCCCCAGCAAAGATCATCTCCTTTGGGGGCACCGTGACAACACCTTGGATGAA 
AGATGTTCGGCTGCCTTGCAATTCAGTGGGAGATCCAGCCCCTGCTGTGAAGTGGACC 
AAGG AC AGTG AAGACTCGGCC ATTCC AGTGTCCATGGATGGG CACCGG CTCATCCACA 
CCAATGGCACACTGCTGCTGCGTGCAGTGAAGGCTGAGGACTCTGGCTACTACACGTG 
CACGGCCACCAACACTGGTGGCTTTGACACCATCATCGTCAACCTTCTGGTGCAAGTT 
CCCCTOGACCAGCCCOKrCTCACTGTCTCCAAAACCTCAGCTTCGTCCATCACCCTGA 
CCTGGATTCCAGGTCACAATGGGGGCAGCTCCATCCGAGGCTTCGTGCTACAGTACTC 
GGTGGACAACAGCGAGGAGTGGAAGGATGTGTTCATCAGCTCCAGCGAGCGCTCCTTC 
AAGCTGGACAGCCTCAAGTGTGGCACGTGGTACAAGGTGAAGCTGGCAGCCAAGAACA 
GCGTGGGCTCTGGGCGCATCAGCGAGATCATCGAGGCCAAGACCCACGGGCGGGAGCC 
CTCCTTCAGCAAAGACCAACACCTCTTCACCCACATCAACTCCA(^CATGCTCGGCTT 
AACCTGCAGGGCTGGAACAATGGGGG CTGCCCTATCACAG CCATCGTTCTGGAGTACC 
GGCCCAAGGGGACCTGGGCCTGGCAGGGCCTCCX^GCCAACAGCTCCGGGGAGGTCTT 
TCTCACGGAACTGTOAGAGGCCACGTGGTATCAGCTGCGCATCAGGGCTTGCAACAGT 
GCGGGCTGCGGCAATGAAACAGCCCAGTTCGCCACCCTGGACTACGATGGCAGCACCA 
TTCCACCCATCAAGTCTGCTCAAGGTGAAGGGGATGATGTGAAGAAGCTGTTCACCAT 
CGGCTGCCCn^TCATCCTGGCCACACTGGGGGTGGCACTXSCTCTTCATCGTACGCAAG 
AAGAGGAAGGAGAAACGGCTGAAGCGACTCCGAGATGCAAAGAGTTTGGCAGAAATGT 
TG AT AAGCAAG AACAAT AG AAG CTTTG ACACCCCTGTGAAAGGGCCACCC CAGGGCCC 
ACGGCTACACATTGACATCCCCAGGGTCCAGCTGCTCATCGAGGACAAAGAAGGCATC 
AAG CAACTGGG AG ATG ACAAGGCCACCATC C CTGTG ACAG ATGCTG AGTT CAG C C AAG 
CTGTCAACCCACAGAGCTTCTGTACTGGCGTCTCCTTGCACCACCCAACCCTCATCCA 
GAGCACAGGACCCCTCATCGACATGTCTGACATCCGGCCAGGAACCAATCCAGTGTCC 
AGGAAGAATGTGAAGTCAGCCCACAGCACCCGGAACCGGTACTCAAGCCAGTGGACCC 
TGACCAAGTGCCAGGCCTCCACACCTGCCCGCACCCTCACCTCCGACTGGCGCACCGT 
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GGGCTCCCAGCATGGTGTCACGGTCACTGAGAGTGACAGCTACAGTGCCAGCCTGTCC 
CAGGACACAGACAAAGGAAGGAACAGCATGGTGTCCACTGAGAGTGCCTCTTCCACCT 
ACG AGGAG CTGG CCCGGGCCT ATGAGCATG CCAAG CTGGAGG AG CAG CTGCAGCACGC 
CAAGTTTGAGATCACCGAGTGCTTCATCTCTGACAGTTCCTCTGACCAGATGACCACA 
GGCACCAACGAGAACGCCGACAGCATGACATCCATGAGCACACCCTCAGAGCCTGGCA 
TCTGCCGCTTTACCGCCTCACCACCCAAGCCCCAGGATGCGGACCGGGGCAAAAACGT 
GGCTGTGCCCATCCCTCACCGGGCCAACAAGAGTGACTACTGCAACCTGCCCCTGTAT 
GCCAAGTCAGAGGCCTTCTTTCGAAAGGCAGATGGACGTGAGCCCTGCCCCGTGGTCC 
CACCCCGTGAGGCCTCCATCCGGAACCTGGCTCGAACCTACCACACCCAGGCTCGCCA 
CCTGACCCTGGACCCTGCC^GCAAGTCCTTGGGCCTTCCCCACCCAGGGGCCCCCGCT 
GCCGCCTCCACAGCCACCTTACCTCAGAGGACTCTGGCCATGCCAGCCCCCCCAGCCG 
GCACAGCCCCCCCAGCCCCCGGCCCCACCCCTGCTGAGCCACCCACCGCCCCCAGCGC 
TGCCCCTCCGGCCCCCAGCACCGAGCCTCCACGAGCCGGGGGCCCACACACCAAAATG 
GGGGGCTCCAGGGACTCGCTTCTCGAGATGAGCACATCGGGGGTAGGGAGGTCTCAGA 
AGCAGGGGGCCGGGGCCTACTCCAAATCCTACACCCTGGTGTAGGGCCGGCAGGAAGA 
GCAGCCACGCCTGGGCCGGGCCGCGCCGCAGCCCCACACGCCAGCTCGGCTGTTTTTC 


TGCATTATTTATATTCAACTGACAGACAAAAACCAACCAACGACAAAACAAAAACCCC 


CAATCATGAACGCCTGTACATAGAACTCTTTTGTACAAATGAAACTATTTTCTTCTTC 


TCCATGAAGCCAGGGCACAAAGAATTTGACAGTACAAGTCAAATCCCCCACCCCACAA 


AATATGTGTGGAGATATATATACATATATAGACAGACAGGAACGCCTCCACGAGCTAT 


ATATCTATATATTTCTCTCACCCTATTTTGAGACAGAGGCACAAAGACTCAGCAATTT 


TTTTCCCTCCTCCTCACCTTCCCCCCAGTCTAGGTGGTTTTGACAAAGACCAAAATCC 


CAACTCAGAGACACTGCATGCGATTTTACTGTTCCAAGAAAACCAGGAGTTGCTTCAA 


TTTGCAGATGCTTATGTGTTAATACCTTTTTCTATGAAAAAAGACCCAGCGCCGTGTG 


CAATAAAGGT.TATGTTTC CAAAAAAAAGCTT 




ORF Start: ATG at 129 


ORF Stop: TAG at 5958 




SEQ ID NO: 264 


1943 aa MW at 211904.3kD 


NOV92a 5 

CG59754-02 Protein Sequence 


MPIRITWRKDGQVIISGSGVTIESKEFMSSIiQISSVSLKHNGNYTCIASNAAATVSIV 
S P EHRF F I TYHGGL Y I S DVQKED ALS TYRC I TKHKY SGETRQSNGARLS VTD P A£S I P 
TI LDGFHSQEVWAGHTVE LPCTASG YPI PAI RWLKDGRPLPADSRWTKRITGLT I SDL 
RTEDSGTYICEVTNTFGSAEATGILMVIDPLHVTLTPKKliKTGIGSTVILSCALTGSP 
EFTI RWYRNTELVLPDEAI S I RGLSNETLLI TS AQKSHSGAYQC FATRKAQTAQDFAI 
IALEDGTPRIVSSFSEKWNPGEQFSLMCAAKGAPPPTVTWALDDEPIVRDGSHRTNQ 
YTMS DGTTI SHMNVTGPQ I RDGGVYRCTARNLVGSAE YQARINVRGPPSI RAMRNI TA 
VAGRDTLINCRVIGYPYYSIKWYKDALLLPDNHRQWFENGTLKLTDVQKGMDEGEYL 
CSVLIQPQLSISQSVHVAVICVPPLIQPFEFPPASIGQLLYIPCWSSGDMPIRITWRK 
DGQVI I SGSGVTI ESKEFMSSLQI SSVSLKHNGNYTCI ASNAAATVSRERQLI VRVPP 
RFWQPNNQDG I YGKAG VLNCS VDG YPPPKVMWKHAKGSGNPQQYHPVPLTGR IQI LP 
NS SLLI RHVLEEDIG YYLCQASNGVGTD I S KSMFLTVKI PAMITSH PNTT I AI KGHAK 
ELNCTARGERPI I IRWEKGDTVIDPDRVMRYAI ATKDNGDEWSTLKLKPADRGDSVF 
FSCHAI NS YGEDRGLI QLTVQE PPDPPELE I RE VKARSMNLRWTQRFDGNS 1 1 TGFDI 
E YKNKSDSWDFKQSTRNI S PTI NQANI VDLHPASVYS I RMYS FNKIGRSE PSKELTI S 
TEEAAPDGPPMDVTLQPVTSQS I QVTWKAPKKELQNGVI RG YQIGYRENS PGSNGQYS 
IVEMKATGDSEVYTLDNLKKFAQYGVWQAFNRAGTGPSSSEINATTLEDVPSQPPEN 
VRALS I TSDVAVI S WS EP PRSTLNGVLKGYRVI FWS LYVDGEWGEMQN I TTTRERVEL 
RGMEKFTNYSVQVLAYTQAGDGVRSSVLYIQTKEDVPGPPAGIKAVPSSASSVVVSWL 
PPTKPNGVIRKYTIFCSSPGSGQPAPSEYETSPEQLFYRIAHLNRGQQYLLWVAAVTS 
AGRGNSSEKVTI EPAGKAPAKI ISFGGTVTTPWMKDVRLPCNSVGDPAPAVKWTKDSE 
DSAI P VSMDGHRLI HTNGTLLLRA VKAE DS G YYTCTATNTGG FDT 1 1 VNLLVQVP PDQ 
PRLTVS KT S AS S ITLTW I PGDNGGS S I RGFVLQ Y S VDN S E EWKD VF I S SS ERS FKLD S 
IJCCGTWYKVKLAAKNSVGSGRISEIIEAIO'HGREP 

WNNGGCP I T AI VLE YR PKGTW AWQGLRANS SG EVFLTE LREATWYE LRMRACNS AGCG 
NETAQFATLDYDGSTI PPIKSAQGEGDDVKKLFTIGCPVI LATLGVALLFI VRKKRKE 
KRLKRLRDAKS LAEMLI S KNNRS FDT PVKGPPQG PRLH ID I PRVQLLI EDKEG I KQLG 
DDKATI PyTDAEFSQAVNPQSFCTGVSLHHPTLIQSTGPLIDMSDI RPGTNPVSRKNV 
KSAHSTRNRYS SQWTLTKCQASTPARTLTSDWRTVGSQHGVTVTESDS YSAS LSQDTD 
KGRNSMVSTE SASSTY EELARAYEHAKLEEQLQHAKFE ITEC F I SDSSSDQMTTGTNE 
NADSMTSMSTPSEPGICRFTASPPKPQDADRGKNVAVPIPHRANKSDYCNLPLYAKSE 
AF FRKADGRE PC PW P PREAS I RNLART YHTQ ARHLTLDP AS ICS LG LPHPG APAAAS T 
ATLPQRTLAMPAPPAGTAPPAPGPTPAEPPTAPSAAPPAPSTEPPRAGGPHTKMGGSR 
DSLLEMSTSGVGRSQKQGAGAYSKSYTLV 




SEQ ID NO: 265 


6049 bp 


NOV92b, 

CG59754-01 DNA Sequence 


CCACAGAGGGGAAATGCCAGCTTCCCTCTCCCTGGGGCTCCGTGCCCCCTCTGATCCA 


GCCCTTCGAATTCCCACCCGCCTCCATCGGCCAGCTGCTCTACATTCCCTGTGTGGTG 


TCCTCGGGGGACATGCCCATCCGTATCACCTGGAGGAAGGACGGACAGGTGATCATCT 
CAGG CTCGGG CGTG ACCATCGAGAGCAAGGAATTCATGAGCTCCCTGCAG AT CTCT AG 
CGTCTCCCTCAAGCACAACGGCAACTATACATGCATCGCCAGCAACGCAGCCGCCACC 
GTGAGCATTGTGTCTCC^GAACACAGGTTTTTTATTACCTACCACGGCGGGCTGTACA 
TCTCTG ACGTACAG AAGGAGGACG CCCTCTCCACCT ATCGCTG CATCACCAAGCACAA 
GTATAGCGGGGAGACCCGGCAGAGCAATGGGGCACGCCTCTCTGTGACAGACCCTGCT 
GAGTCGATCCCCACCATCCTGGATGGCTTCCACTCCCAGGAAGTGTGGGCCGGCCACA 
CC^TGGAGCTGCCCTGCACCGCCTCCKMCTACCCT^ 

GGATGGCCGGCCCCTCCCGGCTGACAGCCGCTGGACCAAGCGCATCACAGGGCTGACC 
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at(^gcgacttgcxx;accgaggacagcggcacctacatttgtgaggtcaccaacacct 
tcggttcggcagaggccacaggcatcct catggtcattg atccc cttcatgtgacc ct 
gacaccaaagaagctgaagaccggcattggcagcacggtcatcctctcctgtgccctg 

ACGGGCTCCCCAGAGTTCACCATCCGCTGGTATCGCAACACGGAGCTGGTGCTGCCTG 

ACGAGGCCATCTCCATCCGCGGGCTCAGCAACGAGACGCTGCTCATCACCTCGGCCCA 

GAAGAGCCATTCCGGGGCCTACCAGTGCTTCGCTACCCGCAAGGCCCAGACCGCCCAG 

GACTTTGCCATCATTGCACTTGAGGATGGCACGCCCCGCATCGTCTCGTCCTTCAGCG 

AGAAGGTGGT CAAC CCCGGGGAGC AGTTCT CACTGATGTGTGCGGCC AAGGGCGCCCC 

GCCCCCCACAGTCACCTGGGCCCTCGACX3ATGAGCCCATCGTGCGGGATGGCAGCCAC 

CG CACC AACCAGTACACCATGTCGG ACGGCACCAC CATC AGCCACATGAACGTCACAG 

GCCCCCAGATCCGCGACGGGGGCGTGTACCGGTGCACAGCGCGGAACTTGGTGGGCAG 

TG CTG AAT ATC AGGCGCGAATAAACGTAAG AGG CCCACCC AGC ATCCGGG CT ATGCGG 

AACATCACAGCAGTCGCCGGGCGGGACACCCTTATCAACTGCAGGGTCATCGGCTATC 

CCTACT ACTCCATCAAGTGGTAC AAGG ATG CCTTGCTG CTGCCAGACAACCACCGCCA 

GGTGGTGTTTGAGAATGGGACCCTCAAGCTGACTGACGTGCAGAAGGGCATGGATGAG 

GGGG AGTACCTGTGCAGTGTCCTCATCCAG CCC CAGCT CTCCAT CAGCCAGAGCGTTC 

ACGTAGCCGTCAAAGTGCCCCCTCTGATCCAGCCCTTCGAATTCCCACCCGCCTCCAT 

CGGC CAG CTGCTCTACATTCCCTGTGTGGTGTC CTCGGGGGACATGCCCATCCGT ATC 

ACCTGGAGGAAGGACGGACAGGTGATCATCTCAGGCTCGGGCGTGACCATCGAGAGCA 

AGGAATTCATGAGCTCCCTGCAGATCTCTAGCGTCTCCCTCAAGCACAACGGCAACTA 

TACATGCATCGCCAGCAACGCAGCCGCCACCGTGAGCCGGGAGCGTCAGCTCATCGTG 

CGTGTGCCCCCTCGATTTGTGGTGCAACCCAACAACCAGGATGGCATCTACGGCAAAG 

CTGGTGTG CTCAACTG CTCGGTGGACiSGCTACCCCCCACCCAAGGTCATGTGGAAGCA 

TGCCAAGGGTAGCGGGAACCCCCAGCAGTACCACCCTGTGCCCCTCACTGGCCGCATC 

CAG ATC CTGCCCAACAGCTCGCTG CTGATCCGCCACGTCCT AG AAG AGGACATCGGCT 

A CT A CCT CTG CCAGG C CAG CAACG G CG TAG G CAC CGACAT C AGCAAGTCCATG TT C CT 

CACAGTCAAGATCCCCACCATCCTGGATGGCTTCCACTCCCAGGAAGTGTGGGCCGGC 

C^CACCGTGGAGCTGCCCTGCACCGCCTCGGGCTACCCTATCCCCGCCATCCGCTGGC 

TCMGGATGGCCGGCCCCTCCCGGCTGACAGCCGCI^ACC^GCXSC^TCACT^ 

G A CCAT CAGCG ACTTG CGG AC CG AGG ACAG CGG CAC CT ACATTT GTG AGG T CAC CAAC 

ACCTTCGGTGAGGCCACAGGCATCCTCATGGTCATTGGTGAGGAGCCCCCCGACCCCC 

CAGAGCTGGAGATCCGGGAGGTGAAGGCCCGGAGCATGAACCTGCGCTGGACCCAGCG 

ATT CGACGGG AACAGCAT CAT CACGGG CTT CG ACATTG AAT ACAAG AACAAAT CAG AT 

TCCTGGGACTTCAAGCAGTCCACACGCAACATCTCCCCCACCATCAACCAGGCCAACA 

TTGTGGACTTGCACCCGGCATCTGTGTACAGCATCCGCATGTACTCTTTCAACAAGAT 

TGGCCGCAGTGAACCAAGCAAGGAGCTCACCATCAGCACTGAGGAGGCCTCAGCTCCC 

G ATGGGCCCCCCATGGATGTTACCTTGCAGCCAGTG AC CTCACAG AGCAT CCAGGTGA 

CCTGGAAGCAGGCACCCAAGAAGGAGCTGCAGAACGGTGTCATCCGGGGCTACCAGAT 

TGGCTACAGAG AG AACAGCCCCGG CAGC AACGGG CAGT ACAGCATCGTGGAGATGAAG 

GCCACGGGGG ACAG CG AGGT CTACACCCTGGACAACCTCAAGAAGTTCGCCCAGT ATG 

GGGTGGTGGT CCAGGC CTTC AATCGGGCTGG CACGGGG CCCTCTTCCAGCG AGATCAA 

TGCCACCACTCTGGAGGATGTGCCCAGCCAGCCCCCTGAGAACGTCCGGGCCCTGTCC 

ATCACTTCTGACGTGGCCGTCATCTCCTGGTCAGAGCCCCCGCGCAGCACCCTCAATG 

GCGTCCTCAAAGGCTATCGGGTCATCTTCTGGTCCCTCTATGTTGATGGGGAGTGGGG 

CGAGATGCAG AACATCAC CACCACG CGGG AGCGGGTGG AGCTGCGGGGCATGG AGAAG 

TTCACCAACTACAGCGTCCAGGTGCTGGCCTACACCCAGGCTGGGGACGGCGTACGCA 

GCAGTGTGCTCTACATCCAGACCAAGGAGGACGTTCCAGGTCCCCCTGCTGGCATCAA 

AGCTGTCCCTTCATCAGCTAGCAGTGTGGTTGTGTCTTGGCTCCCCCCTACCAAGCCC 

AACGGGX5TGATCCX3CAAGTACACCATCTTCTGTTCCAGCCCCGCCCCGCAGGCTCCCA 

GCG AGTACGAGACG AGTCCAGAGCAGCTCTTCT ACCGGAT CG CC CACCTAAACCGCGG 

T C AG CAGT AT CTG CTGTG GG TGG CCGCCGT CAC CTCTGCCGGCCGGGG CAAC AG CAGC 

GAG AAGGTGACC ATCGAG CCTGCTGGCAAGGCCCCAG C AAAGATCATCTCCTTTGGGG 

G CA C CG TG AC AACACCTTGG ATG AAAG ATGTTC GG CTG C C TTG CAATT C AG TGGG AG A 

TCCAGCCCCTGCTGTGAAGTGGACCAAGGACAGTGAAGACTCGGCCATTCCAGTGTCC 

ATGG ATGGGC AC CGG CTCAT C CACAC CAATGG CACACTG CTG CTG CGTG CAG TG AAGG 

CTGAG^ACTCTGGCTACTACACGTGCACGGCCACCAACACTGGTGGCTTTGACACCAT 

CATCGTCAACCTTCTGGTGCAAGTTCCCCCGGACCAGCCCCGCCTCACTGTCTCCAAA 

ACCTCAGCTT CGTCCATCACCCTGACCTGG ATTC CAGGTGACAATGGGGGCAGCTCCA 

TCCGAGGTTTTGTGCTACAGTACTCGGTGGACAACAGCGAGGAGTGGAAGGATGTGTT 

CATCAGCTCCAGCG AG CG CTCCTTCAAGCTGGACAGCCTCAAGTGTGGCACGTGGT AC 

AAGG TG AAG CTGG CAG CC AAG AACAG CG TGGG CT CTG GGCGCAT CAG CG AG AT CAT CG 

AGGCCAAGACCCACGGGCGGGAGCCCTCCITCAGCAAAGACCAACACCTCTTCACCCA 

CATCAACTCCACGCATGCTCGGCTTAACCTGCAG^GCTGGAACAATGGGGGCTGCCCT 

ATCACAGCCATCGTTCTGGAGTACCX^CCCAAGGGGACCTGGGCCTXK5CAGGGCCT 

GGGCCAACAGCTCCGGKMAGGTGTTTCTGACGGAACTGCGAGAGGCCACGTGGTACGA 

GCTGCGCATGAGGG CTTC CAACAGTG03GG CTG CX^CAATGAAACAGCCCAGT^ 

AC C CT GG ACT ACG ATGG CAGT AC C ATT C CA C CCAT C AAGT CTG CTCAAGGTG AAGGGG 

ATG ATGTGAAGAAG CTGTTCACCATCGGCTG CCCTGTC ATCCTGGCCACACTGGGGGT 

GGCACTGCTCTTCATCGTACGCAAGAAGAGGAAGGAGAAACGGCTGAAGCGACTCCGA 

GATGCAAAGAGTTTGG(^GAAATGTTGATAAGCAAGAACAATAGAAGCTTTGACACCC 

CTGTGAAAGGGCCACCCCAGGGCCCACGGCTACACATTGACATCCCCAGGGTCCAGCT 

GCTCATCGAGGACAAAGAAGGCATCAAGCAACTGGGTGAGGACAAGGCCACCATCCCT 

GTGACAGATGCTGAGTTCAGCCAAGCTGTCAACCC^CAGAGCTTCTGTACTGGCGTCT 

CCTTGCACCACCCAACCCTCATCCAGAGCACAGGACCCCTCATCGACATGTCTGACAT 

CCGGCCAGGAACCGATCCAGTGTCCAGGAAGAATGTGAAGTCAGCCCACAGCACCCGG 

AACCGGTACTCAAGCCAGTGGACCCTGACCAAGTGCCAGGCCTCCACACCTGCCCGCA 

CCCTCACCTCCGACTGGCGCACCGTGGGCTCCCAGCATGGTGTCACGGTCACTGAGAG 
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TG ACAGCTACAGTGC CAGCCTGTCC CAGGACACAG ACAAAGG AAGGAACAGCATGGTG 
TCCACTGAGAGTGCCTCTTCCACCTACGAGGAGCTGGCCCGGGCCTATGAGCATGCCA 
AGCTGGAGGAGCAGCTGCAGCACGCCAAGTTTGAGATCACCGAGTGCTTCATCTCTGA 
CAGTTCCTCTGACCAGATGACCACAGGCACCAACGAGAACGCCGACAGCATGACATCC 
ATGAGCACACCCTCAGAGCCTGGCATCTGCCGCTTTACCGCCTCACCACCCAAGCCCC 
AGGATGCGGACCGGCTGCTGATGCTGGTCCCAGGTGCCCACCTCCCTCCTCAGTCCAT 
CCATGTTGT AGCAT ATGTCAGAATTTC CTTCTTACTG AACAAGGGTGGGGGAGAC CTG 
GCTTCTGATCTTAG CTCCGGCAGAGCTTG CAGTGAG C CGAG ATCACG CGG CACC CGG C 
CACCAACACTGGTGGCTTTGACACCAT C ATCGTCAAC CTGTGAGGCAGGTGACC CCAG 
GTGGGGACAGGGATGGAGAAAGGGTAGGGATTCCATCATGCGAGAGGGTCATCGAATG 
GAAGAAGCCAAACCAAGGGAGAGACAGACCTCTGGAGAAACAGAGGTGCACATGGAAG 
GGGAGGCAGGAGAGCTGGGGAGTGGGAGTGGGAGTGAGGGTGTGGGAGAGCCAGCACC 
TTCCCGTCACGGGGGGACTCCCCACACCCCATCACAGGGTCCGCCCTTGTGCTAAGGG 
GTGGTGGCTTTCCCCTCACAGTTCCCCCGGACCAGCCCCGCCTCACTGTCTCCAAAAC 


CTCAGCTTCGTCCAfCACCCTGACCTGGATTCCAGGTGACAATGGGGGCAGCTCCATC 


CGAGGTGAGGAGGGGTCTGGATGCGGGGGAAGATAGGGGAAGGAATTCTGGGCCCGGG 


G CAGGGAAGGGG CTTCA 




ORF Start: ATG at 129 


ORF Stop: TAA at 5853 




SEQ ID NO: 266 


1908 aa 


MW at 208575.3kD 


NOV92b, 

CG59754-01 Protein Sequence 


M P I R I TWRKDGQV 1 1 SGSG VT I ES KE FMS S LQ I S SVS LKHNGNYTC I ASNAAATVS IV 
SPEHRFFITYHGGLYISDVQKEDALSTYRCITKHKYSGETRQSNGARLSVTDPAESIP 
TILDGFHSQEVWAGHTVELPCTASGYPIPAIRWLKDGRPLPADSRWTKRITGLTISDL 
RTEDSGTYICEVTNTFGSAEATGILMVIDPLHVTLTPKKLKTGIGSTVILSCALTGSP 
EFTIRWYRNTELVLPDEAISIRGLSNETLLITSAQKSHSGAYQCFATRKAQTAQDFAI 
I ALbUG rPRXVooro EKV VN PCjE Q F S LnCAAKG AP P PTVTW AhDuE P 1 VKDLj oHK I Ny 
YTMSDGTT I SHMNVTG PQ I RDGG VYRCTARNLVGS AE YQAR I NVRGP P S I RAMRN I TA 
VAGRDTLINCRVIGYPYYS I KWYKDALLLPDNHRQ WFENGTLKLTDVQKGMDEGEYL 
CSVLI QPQLS I SQS VHVAVKVPPLIQPFEF P PAS I GQLLYI PCWS SGDMPI RI TWRK 
DGQVI I SGSGVTI ESKEFMSSLQI S SVS LKHNGNYTC I ASNAAATVSRERQLI VRVPP 
RFWQ PNNQDG I YGKAGVLNCS VDG Y P P PKVMWKHAKG SGN PQQ YH P V PLTGR I Q I LP 
NSSLLIRHVLEED IGYYLCQASNGVGTD I S KSMFLTVKI PT I LDGFHSQEVWAGHTVE 
LPCTASGYPIPAIRWLKDGRPLPADSRWTKRITGLTISDLRTEDSGTYICEVTNTFGE 
ATGIIjMVIGEEPPDPPELEIREVKARSMNLRWTQRFDGNSIITGFDIEYKNKSDSWDF 
KQSTRNISPTINQANIVDLHPASVYSIRMYSFNKIGRSEPSKELTISTEEASAPDGPP 
MDVTLQPVTSQS IQVTWKQAPKKELQNG VI RGYQIGYRENS PGSNGQYS I VEMKATGD 
SEVYTI i DNLFCKFAQYG\AA^QAFNRAGTGPSSSEINATTLEDVPSQPPENVRALSITSD 
VAVI SW SEP PRST LNGVLKG YR VI FWS L YVDGE WG EMQN ITTTR ERVE LRGME KFTNY 
S VQVLAYTQAGDGVRSSVLYIQTKEDVPG P PAG I KAVPSSASS VWSWLP PTKPNGVI 
RKYT I FCSS PAPQAPSEYETS PEQLFYRI AHLNRGQQYLLWVAAVTSAGRGNSSEKVT 
IEPAGKAPAKIISFGGTVTTPWMKDVRLPCNSVGDPAPAVKWTKDSEDSAIPVSMDGH 
RLI HTNGTLLLRAVKAEDSGYYTCTATNTGGFDTI I VNLLVQVP PDQPRLTVSKTSAS 
S ITLTW I PGDNGGSS I RGFVLQYSVDNSEEWKDVF I SSS ERS FKLDSLKCGTWYKVKL 
AAKNSVGSGRI SE 1 1 EAKTHGRE PS F S KDQH LFTH INSTHARLNLQGWNNGG C P I TAI 
VLEYRPKGTWAWQGLRANSSGEVFLTELREATWYELRMRACNSAGCGNETAQFATLDY 
DGSTI PPI KS AQGEGDDVKKLFT IGC PVI LATLGVALLF I VRKKRKEKRLKRLRDAKS 
LAEMLI S KNNRS FDTPVKGP PQGPRLH I D I PRVQLLI EDKEG I KQLGEDKAT I P VTDA 
EFSQAVNPQSFCTGVSLHHPTLIQSTGPLIDMSDIRPGTDPVSRKNVKSAHSTRNRYS 
S QWTLT KCQAST P ART LT SD WRTVG S QHG VTVTE S D S YS AS LS QDTDKGRNS MVS T E S 
AS STYEE LARAYEHAKLEEQLQHAKFE I TECFISDS S SDQMTTGTNENADSMTSMSTP 
S E PG I CRFTAS P PKPQDADRLLMLVPGAHLP PQS I HWAYVR I S FLLNKGGGDLASDL 
SSGRACSEPRSRGTRPPTLVALTPSSSTCEAGDPRWGQGWRKGRDSIMREGHRMEEAK 
PRERQTSGETEVHMEGEAGELGSGSGSEGVGEPAPSRHGGTPHTPSQGPPLC 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 92B. 



Table 92B. Comparison of NOV92a against NOV92b. 


Protein Sequence 


NOV92a Residues/ 1 Identities/ 
Match Residues j Similarities for the Matched Region 



365 



WO 02/072757 



PCT/US02/06908 



NOV92b 


1..1771 j 


1663/1773 (93%) 




1..1760 j 


. 1681/1773 (94%) 



Further analysis of the NOV92a protein yielded the following properties shown in 
Table 92C. 



Table 92C. Protein Sequence Properties NOV92a 


PSort 
analysis: 


0.7000 probability located in plasma membrane; 0.3000 probability located in 
microbody (peroxisome); 0.3000 probability located in nucleus; 0.2000 
probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV92a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 92D. 



Table 92D. Geneseq Results for NOV92a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV92a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Region 


Expect 
Value 


AAU28091 


Novel human secretory protein, Seq 
ID No 260 - Homo sapiens, 1744 
aa. [WO200166689-A2, 13-SEP- 
2001] 


200.. 1943 
1..1744 


1744/1744(100%) 
1744/1744(100%) 


0.0 


AAM78713 


Human protein SEQ ID NO 1375 - 
Homo sapiens, 1744 aa. 
[WO200157190-A2, 09-AUG- 
2001] 


200..1943 
1..1744 


1744/1744(100%) 
1744/1744(100%) 


0.0 


AAM39040 


Human polypeptide SEQ ID NO 
2185 - Homo sapiens, 1744 aa. 
[WO200153312-A1, 26-JUL-2001] 


200.. 1943 
1..1744 


1744/1744(100%) 
1744/1744(100%) 


0.0 


AAW42086 


Human Down syndrome-cell 
adhesion molecule DS-CAM1 - 
Homo sapiens, 1910 aa. 
[W09817795-A1, 30-APR-1998] 


44.. 1778 
154..1890 


1085/1745(62%) 
1357/1745(77%) 


0.0 


AAW42087 


Human Down syndrome-cell 
adhesion molecule DS-CAM2 - 
Homo sapiens, 1571 aa. 
[W09817795-A1, 30-APR-1998] 


44.. 1457 
154.. 1564 


890/1416(62%) 
1109/1416(77%) 


0.0 
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WO 02/072757 PCTYUS02/06908 

In a BLAST search of public sequence databases, the NOV92a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 92E. 



Table 92E. Public BLASTP Results for NOV92a 


Protein 
Accession 
Number 


Protein/Organism/Lengtb 


NOV92a 

IVcMUUca/ 

Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion \ 


Value 


A AT ^71 fif\ 


DOWN ^VMTYROMF PPT T 

ADHESION MOLECULE 
DSCAML1 - Homo sapiens 
(Human), 2053 aa. 


AA 1 QA'X 

155..2053 


i rrq/1 onn (qq%\ 

1007/ 17W ^77/0^ 

1892/1900(99%) j 


0 0 


Q9ULT7 


KIAA1 132 PROTEIN - Homo 
sapiens (Human), 1822 aa 
(fragment). 


122.. 1943 
1..1822 


1822/1822(100%) 
1822/1822(100%) j 


0.0 


060469 


Down Qvndrnmp ppII aHVift^inn 

molecule precursor (CHD2) - 
Homo sapiens (Human), 2012 aa. 


44 1943 
154..2012 


1 123/1920 (58%) 
1410/1920(72%) 


0.0 


Q9ERC8 


DOWN SYNDROME CELL 
ADHESION MOLECULE - Mus 
musculus (Mouse), 2013 aa. 


44..1943 
154..2013 


1119/1921 (58%) 1 
1405/1921 (72%) I 


0.0 


AAL57167 


DOWN SYNDROME CELL 
ADHESION MOLECULE 
DSCAM - Rattus norvegicus (Rat), 
2013 aa. 


44.. 1943 
154.. 20 13 


1119/1921 (58%) 
1405/1921 (72%) j 


0.0 



PFam analysis predicts that the NOV92a protein contains the domains shown in the 
Table 92F. 



Table 92F. Domain Analysis of NOV92a 


Pfam Domain 


NOV92a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


ig: domain 1 of 10 


1..48 


12/49 (24%) 
38/49 (78%) 


2.7e-05 



367 



WO 02/072757 



PCT/US02/06908 



ig: domain 2 of 10 


72..90 


8/19 (42%) 
14/19 (74%) 


85 


ig: domain 3 of 10 


130.. 186 


22/60 (37%) 
46/60 (77%) 


2.1e-14 


ig: domain 4 of 10 


219..278 


16/63 (25%) 
44/63 (70%) 


4.9e-09 


ig: domain 5 of 10 


312..377 


14/69 (20%) 
50/69 (72%) 


1.5e-07 


ig: domain 6 of 10 


409..467 


12/61 (20%) 
41/61 (67%) 


4.8e-05 


ig: domain 7 of 10 


500..561 


17/64(27%) 
49/64 (77%) 


3.2e-ll 


ig: domain 8 of 10 


594..659 


19/69(28%) 
47/69 (68%) 


9.4e-07 


ig: domain 9 of 10 


693..759 


9/70 (13%) 
47/70 (67%) 


7.9e-06 


m3: domain 1 of 6 


777..864 


22/89 (25%) 
65/89 (73%) 


3e-16 


m3: domain 2 of 6 


876..968 


33/93 (35%) 
68/93 (73%) 


3.1e-16 


fh3: domain 3 of 6 


980.. 1069 


26/93 (28%) 
69/93 (74%) 


2.9e-16 


fh3: domain 4 of 6 


1081..1167 


24/88 (27%) 
64/88 (73%) 


3.7e-17 


ig: domain 10 of 10 


1194..1255 


17/65 (26%) 
46/65 (71%) 


4.3e-09 


fh3: domain 5 of 6 


1274..1357 


30/86 (35%) 
67/86 (78%) 


1.2e-18 


fn3: domain 6 of 6 


1371..1453 


27/86(31%) 
53/86 (62%) 


0.045 



Example 93. 

The NOV93 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 93A. 



Table 93A. NOV93 Sequence Analysis 




SEQ ID NO: 267 1272 bp 


NOV93a, 

CG59800-01 DNA Sequence 


AGAGCCTGGTATGCAGGAGGTGCTCTGTAAATACCTGCCCCATACATACCCGCCCCAT 
ACATACCCACCCCATACATACCCACCCCATACATACCTGCCCTGTCCATACCTGCCCC 
CTACATACCTGCCCCGTCCATACCTGCCCCCTACATACCTGCCCCGTCCATACCTGCC 
CCCTACATACCTGCTCTGTCTATACCTGTGG CT AGGACTGTGGCCTTG CTTCTTAG CC 
GCTCAGAGCCTGCCTCCTCCTCTGCAGAGTGGCGGTGGTAGCAGGGCTTCCCGCGCGC 
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WO 02/072757 



PC17US02/06908 





CGATGCTGCTCGTGGCCCTGGTGCTCGGCGCCTACTGCCTCTGCGCCCTCCCCGGCCG 
CTGCCCGCCGGCCGCCCGCGCCCCCGCGCCGGCCCCCGCGCCCTCCGAGCCGTCCAGC 
TCCGTCCACCGCCCGGGAGCACCCGGCCTGCCTTTGGCCAGCGGTCCCGGCCGCCGGC 
GCTTCCCGCAAGCGCTCATCGTTGGCGTGAAGAAGGGCGGCACGCGCGCCCTGCTGGA 
GTTTCTGCGGCTGCACCCCGACCTCCGCGCGCTGGGCTCTGAGCCCCACTTCTTCGAC 
AGGTGCCCCGACCGCGGCCTCGCCTGGTCCCGGAGTCTGATGCCCCGAACCCTGGATG 
GGCAGATCACCATGGAGACGACCCCGGGCTACTTCGTGACGCGAGAGGCCCCCCGCCG 
CATCCACGCCATGTCCCCGGACACGAAGCTGATCGTGGTGGTGCGGAACCCCGTGACC 
CGGGCCATCTCCGACTAGGCCCAGACGCTCTCCAAGACCCCGGGCCTGCCCAGCTTCC 
GCGCCCTGGCCTTCCGCCACGGCCTGGGCCCCGTGGACACAGCCTGGAGCGCCGTCCG 
CATCGGCCTGTACGCCCAGCACCTGGACCACTGGCTGCGCTACTTCCCCCTGTCCCAC 
TTCCTGTTCGTCAGCGGGGAGCGTCTGGTCAGCGACCCGGCCGGAGAGGTCGGCCGCG 
TGCAGGACTTCCTGGGCCTGAAACGGGTCGTCACGGACAAGCACTTCTACTTCAACGC 
CACCAAGGGCTTCCCCTGCCTCAAGAAGGCCCAGGGCGGCAGCCGTCCCCGCTGCCTG 
GGCAAGTCCAAGGGCCGGCCACACCCACGCGTGCCCCAGGCCGTGGTCCGGCGCCTGC 
AGGAGTTCTACCGGCCCTTCAACCGCAGGTTCTACCAGATGACGGGCCAGGACTTCGG 
CTGGGGCTGAGCGGCACCCTGGGGATGCTCAGCACCTTGATTGACACCCGCTCG 




ORF Start: GAG at 2 


ORF Stop: GGCat 1217 




SEQIDNO: 268 


405 aa 


MW at 43994.8kD 


NOV93a, 

CG59800-01 Protein Sequence 


MQEVLCKYLPHTYPPHTYPPHTYPPHTYLPCPYLPPTYLPRPYLPPTYLPRPYLPPTY 
LLCLYLWLGLWPCFLAAQSLPPPLQSGGGSRASRAPMLLVALVLGAYCLCALPGRCPP 
AARAPAPAPAPSEPSSSVHRPGAPGLPLASGPGRRRFPQALIVGVKKGGTRALLEFLR 
LH PDXRALG S EXH FFDRCXXXGLXWXRS LM PRTLDGQ I TMEXT PXYFVTREAPRRI HA 
MS PDTKLI VVVRNPVTRAI SDXXQTLSKTPGLPSFRAIiAFRHGLGPVDTAWSAVRIGL 
YAQHLDHWLRYFPLSHFLFVSGERLVSDPAGEVGRVQDFLGLKRWTDKHFYFNATKG 
F P CLKKAQGG S RPRCLGKS KGR PH PRVPQAXVRRLQE FYRPFNRRFYQMTGQD FGWG 



Further analysis of the NOV93a protein yielded the following properties shown in 
Table 93B. 



Table 93B. Protein Sequence Properties NOV93a 


PSort 
analysis: 


0.6000 probability located in plasma membrane; 0.4000 probability located in 
Golgi body; 0.3000 probability located in endoplasmic reticulum (membrane); 
0.1000 probability located in mitochondrial inner membrane 


SignalP 
analysis: 


Likely cleavage site between residues 7 and 8 



A search of the NOV93a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 93C. 



Table 93C. Geneseq Results for NOV93a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV93a | Identities/ 
Residues/ Similarities for 


Expect 
Value 



369 



WO 02/072757 



PCTYUS02/06908 







Residues 


Region 




AAB95507 


Human protein sequence SEQ ID 
NO: 18067 - Homo sapiens, 390 aa. 
[EP1074617-A2, 07-FEB-2001] 


31..253 
11. .237 


121/229 (52%) 
146/229 (62%) 


4e-55 


AAY17066 


Human 3-OST-3B protein - Homo 
sapiens, 390 aa. [WO9922005-A2, 
06-MAY-1999] 


31..253 ! 
11..237 i 


121/229 (52%) 
146/229 (62%) 


4e-55 


AAB70115 | 


Human 3-OST-3B - Homo sapiens, 
391 aa. [WO200113910-A2, 01- 
MAR-2001] 


31. .253 i 

11. .238 ; 


121/230 (52%) 
146/230 (62%) 


9e-54 


AAB70114 i 


Murine 3-OST-3B - Mus sp, 391 aa. 
[WO200113910-A2, 01-MAR-2001] 


31. .253 1 
11..238 


119/231 (51%) 
147/231 (63%) 


2e-51 


AAU 12275 | 


Human PRO5004 polypeptide 
sequence - Homo sapiens, 367 aa. 
[WO200140466-A2, 07-JUN-2001] 


86..253 i 
45..214 | 


102/170 (60%) 
117/170(68%) 


9e-48 



In a BLAST search of public sequence databases, the NOV93a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 93D. 



Table 93D. Public BLASTP Results for NOV93a 


Protein 
Accession 
Number < 


Protein/Organism/Length 


NOV93a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96QI5 j 


C439A6.1 (NOVEL PROTEIN 
SIMILAR TO HEPARAN SULFATE 
(GLUCOSAMINE) 3-0- 
SULFOTRANSFERASES) - Homo 
sapiens (Human), 381 aa (fragment). 


8S..253 i 
61. .229 


160/169 (94%) 
162/169 (95%) 


2e-89 


Q96RX7 ! 


HEPARAN SULPHATE D- 
GLUCOSAMINYL 3-0- 
SULFOTRANSFERASE-3 B LIKE - 
Homo sapiens (Human), 3 1 1 aa. 


95..253 
1..159 | 


153/159(96%) 
155/159 (97%) 


le-85 


Q9Y662 | 


HEPARAN SULFATE D- 
GLUCOSAMINYL 3-0- 
SULFOTRANSFERASE-3B (EC 
2.8.2.23) - Homo sapiens (Human), 390 
aa. 


31. .253 
11. .237 


121/229 (52%) 
146/229 (62%) 


le-54 


Q9QZS6 1 


D-GLYCOSAMINYL 3-0- 
SULFOTRANSFERASE-3B - Mus 
musculus (Mouse), 390 aa. 


31. .253 
11. .237 


119/230 (51%) 
147/230 (63%) 


3e-52 



370 



WO 02/072757 



PCT/US02/06908 



Q9Y278 j 


HEPARAN SULFATE D- 


86..253 


102/170(60%) 


3e-47 




GLUCO S AMINYL 3-0- 


45..214 


117/170(68%) 






SULFOTRANSFERASE-2 (EC 








2.8.2.23) - Homo sapiens (Human), 367 










aa. 









PFam analysis predicts that the NOV93a protein contains the domains shown in the 
Table 93E. 



Table 93E. Domain Analysis of NOV93a 



Pfam Domain 



NOV93a Match Region 



Identities/ 
Similarities 
for the Matched Region 



Expect Value 



No Significant Matches Found 



Example 94. 

The NOV94 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 94A. 



Table 94A. NOV94 Sequence Analysis 



SEQ ID NO: 269 2949 bp 



NOV94a, 

CG59761-01 DNA Sequence 



GTCCGCCTCCGGGCCGCCGAGCCQCAGCCGCCGAGATGGGGGCCGCCCCGGGCCGCGC 



CCCCGCCGGGTCCCGCCCGCCGCGCTGCCGCTGAGCGC ATGGGCCCGGACCGCGCCGC 
GCCGCTCCGGGAGCCGGGCCCGGGGTCCCGCCACCACCGCGCGCGGGACAGATTGATT 
CACTTTCGAGCTGTAAGTACIGATGTATTAGGGTCCAGCGCTCATTGTTCATTGACGC 
AGAGTCCCAAAATGAATATCCAAGAGCAGGGTTTCCCCTTGGACCTCGGAGCAAGTTT 
CACCGAAGATGCTCCCCGACCCCCAGTGCCTGGTGAGGAGGGAGAACTGGTGTCCACA 
GACCCGAGGCCOjCCAGCTACAGT^CTGCTCCGGGAAAGGTGTTGGCATTAAAGGTG 
AG ACTTCGACGGC CACT CCGAGGCGCTCGG ATCTGGACCTGGGGTATGAGCCTGAGGG 
CAGTGCCTCCCCCACCCCACCATACTTGAAGTGGGCTGAGTCACTGCATTCCCTGCTG 
GATGACCAAGATGGGATAAGCCTGTTCAGGACTTTCCTGAAGCAGGAGGGCTGTGCCG 
ACTTGCTGGACTTCTGGTTTGCCTGCACTGGCTTCAGGAAGCTGGAGCCCTGTGACTC 
GAACGAGGAGAAGAGGCTGAAGCTGGCGAGAGCCATCTACCGAAAGTACATTCTTGAT 
AACAATGGCATCGTGTCCCGGCAGACCAAGCCAGCCACCAAGAGCTTCATAAAGGGCT 
G CAT CATGAAG CAG CTG AT CG AT C CTG C CATGTTTGAC C AGG C C C AG AC CG AAAT CCA 
GG C CACT ATGG AGG AAAACACCT AT C C CT CCTTCCTTAAG T CTGAT ATTT ATTTG G AA 
TATACGAGGACAGGCTCGGAGAGCCCCAAAGTCTGTAGTGACCAGAGCTCTGGGTCAG 
GGACAGGGAAGGGCATATCTGGATACCTGCCGACCTTAAATGAAGATGAGGAATGGAA 
GTGTGACCAGGACATGGATGAGGACGATGGCAGAGACGCTGCTCCCCCCGGAAGACTC 
CCTCAGAAGCTGCTCCTGGAGACAGCTGCCCCGAGGGTCTCCTCCAGTAGACX3GTACA 
GCGAAGG CAGAGAGTTC AGGT ATGGATCCTGGCGGGAGC CAGTCAACCCCTATTATGT 
CAATGCCGGCTATGCCCTGGCCCCAGCCACCAGTGCCAACGACAGCGAGCAGCAGAGC 
CTGTCCAGCGATGCAGACACCCTGTCCCTCACGGACAGCAGCGTGGATGGGATCCCCC 
CATACAGGATCCGTAAGCAGCACCGCAGGGAGATGCAGGAGAGCGTGCAGGTCAATGG 
GCGGGTGCCCCTACCTCACATTCCCCGCACGTACCGGGTGCCGAAGGAGGTCCGCGTG 
GAGCCTCAGAAGTTCGCGGAGGAGCTCATCCACCGCCTGGAGGCTGTGCAGCGCACGC 
GGGAGGCCGAGGAGAAGCTGGAGGAGCGGCTGAAGCGCGTGCGCATGGAGGAGGAAGG 
TGAGGACGGCGATCCATCATCAGGGCCCCCAGGGCCGTGTCACAAGCTGCCTCCCGCC 
CCCGCTTGGCACCACTTCCCGCCCCGCCTGTGTTGGACATGGGCTTGTGCCGGGCTCC 
GGGATGCACACGAGGAGAACCCTGAGAGCATCCTGGACGAGCACGTACAGCGTGTGCT 
GAGGACACCTGGCCGCCAGTCGCCTGGGCCTGGCCATCGCTCCCCGGACAGTGGGCAC 
G TGG C CAAGATGC C AG TGG CACTGGGGGG TG CCG CCT CGGGG C ACGG G AAG CACG T AC 
CCAAGTCAGGGGCGAAGCTGGACGCGGCCGGCCTGCACCACCACCGACACGTCCACCA 
CCACGTCCACCACAGCACAGCCCGGCCCAAGGAGCAGGTGGAGGCCGAGGCCACCCGC 
AGGGCCCAGAGCAGCTTCGCCTGGGGCCTGGAACCACACAGCCATGGGGCAAGGTCCC 
GAGGCTACTCAG AG AGTGTTGGCGCTGCCCC CAACGC CAGTG ATGGCCTCGCC CACAG 
TGGGAAGGTGGGCGTGGCGTGCAAAAGAAATGCCAAGAAGGCCGAGTCGGGGAAGAGC 
GCCAGCACCGAGGTGCCAGGTGCCTCGGAGGATGCGGAGAAGAACCAGAAAATCATGC 



371 



WO 02/072757 



PCT7US02/06908 





AGTGGATCATTGAGGGGGAAAAGGAGATCAGCAGGCACCGCAGGACCX3GCCACGGGTC 
TT CGGGG ACGAGG AAGCCACAGCCCCATGAGAACTCCAG ACCCTTGTC CCTTG AGCAC 
CCCTGGGCCGGCCCTCAGCTCCGGACCTCCGTGCAGCCCTCCCACGTCTTCATCCAAG 
ACCGCACCATGCCACCCCACCCAGCTCCCAACCCCCTAACCCAGCTGGAGGAGGCGCG 
CCGACGTCTGGAGGAGGAAGAAAAGAGAGGCAGCCGAGCACCCTCCAAGCAGAGGTAT 
GTGCAGGAGGTTATGCGGCGGGGACGCGCCTGCGTCAGGCCAGCGTGCGCGCCGGTGC 
TGCACGTGGTACCAGCCGTGTCGGACATGGAGCTCTCCGAGACAGAGACAAGATCGCA 
GAGG AAGGTGGG CGGCGGG AGTGCC CAGCCGTGTG ACAGCATCGTTGTGGCGTACTAC 
TTCTGCGGGGAACCCATCCCCTACCGCACCCTGGTGAGGGGCCGCGCTGTCACCCTGG 
GCCAGTTC AAGGAGCTG CTG ACCAAAAAGGG CAGCTACAGATACT ACTTCAAG AAAGT 
GAGCGACGAGTTTGACTGTGGGGTGGTGTTTGAGGAGGTTCGAGAGGACGAGGCCGTC 
CTGCC CGTCTTTG AGG AG AAG AT C ATCGGCAAAGTGGAGAAGGTGG ACTGATAGG CTG 
GTGGGCTGGCCGCTGTGCCAGGCGAGGCCCTTGGCGGGCACGGGTGTCACGGCCAGGC 


AGATG AC CTCGTACTCAGGAG CCCG ATGGGGAACAGTGTTGGGTGT ACC 




ORF Start: ATG at 97 


ORF Stop: TGA at 2833 




SEQIDNO: 270 


912aa |MW at 101118.1WD 


NOV94a, 

CG59761-01 Protein Sequence 


MGPDRAAPLREPGPGSRHHRARDRLIHFGAVSTDVU3CSAHCSLTQSPKMNIQEQGFP 
LDLGASFTEDAPRPPVPGEEGELVSTDPRPASYSFCSGKGVGIKGETSTATPRRSDUD 
LGYEPEGSASPTPPYLKWAESLHSLLDDQDGISLFRTFLKQEGCADLLDFWFACTGFR 
KLEPCDSNEEKRLKLARAIYRKYILDNNGIVSRQTKPATKSFIKGCIMKQLIDPAMFD 
QAQTEIQATMEENTYPSFLKSDIYLEYTRTGSESPKVCSDQSSGSGTGKGISGYLPTL 
NEDE EWKCDQDMDEDDGRDAA P PGRLPQKLLLETAA PR VS S S RR YS EGRE FR YG SWRE 
PVNPYYVNAGYALAPATSANDSEQQSLSSDADTLSLTDSSVDGIPPYRIRKQHRREMQ 
ESVQVNGRVPLPHIPRTYRVPKEVRVEPQKFAEELIHRLEAVQRTREAEEKLEERLKR 
VRMEEEGEDGD PSSGPPG PCHKLPPAPAWHHF PPRLCWTWACAGLRDAHEENPES I LD 
EHVQRVLRTPGRQSPGPGHRSPDSGHVAKMPVAIiGGAASGHGKHVPKSGAKLDAAGIiH 
HHRHVHHHVHHSTARPKEQVEAEATRRAQSSFAWGLEPHSHGARSRGYSESVGAAPNA 
SDGLAHSGKVGVACKRNAKKAESGKSASTEVPGASEDAEKNQKIMQWIIEGEKEISRH 
RRTGHGSSGTRKPQPHENSRPLSLEHPWAGPQLRTSVQPSHLFIQDPTMPPHPAPNPL 
TQLEEARRRLEEEEKRASRAPSKQRYVQEVMRRGRACVRPACAPVLHWPAVSDMELS 
ETETRSQRKVGGGSAQPCDSIWAYYFCGEPIPYRTLVRGRAVTLGQFKELLTKKGSY 
RYYFKKVSDEFDCGWFEEVREDEAVLPVFEEKIIGKVEKVD 



Further analysis of the NOV94a protein yielded the following properties shown in 
Table 94B. 



Table 94B. Protein Sequence Properties NOV94a 


PSort 
analysis: 


0.6000 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV94a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 94C. 



Table 94C. Geneseq Results for NOV94a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV94a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 



372 



WO 02/072757 



PCT/US02/06908 



AAG68175 


Wnt signaling protein SEQ ID 
NO:91 - Homo sapiens, 900 aa. 
[WO200177327-A1, 18-OCT-2001] 


13.912 
1..900 


898/900 (99%) 
898/900 (99%) 


0.0 


AAW96264 


Human axin - Homo sapiens, 900 aa. 
[WO9902179-A1, 21 -JAN- 1999] 


13..912 
1..900 


898/900 (99%) 
898/900 (99%) 


0.0 


AAW96265 


Murine axin - Mus musculus, 992 aa. 
[WO9902179-A1, 21-JAN-1999] 


6..912 
84..992 


781/914 (85%) 
820/914 (89%) 


0.0 


AAW93569 


Human conductin protein - Homo 
sapiens, 840 aa. [W0991 1780-A2, 
ll-MAR-1999] 


60..912 
12..840 


378/892 (42%) 
506/892 (56%) 


e-171 


AAW93570 


Human conductin protein - Homo 
sapiens, 840 aa. [W0991 1780-A2, 
ll-MAR-1999] 


60..912 
12..840 


378/892 (42%) 
506/892 (56%) 


e-171 


In a BLAST search of public sequence databases, the NOV94a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 94D. 


Table 94D. Public BLASTP Results for NOV94a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV94a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


015169 


Axin 1 (Axis inhibition protein 1) 
(hAxin) - Homo sapiens (Human), 
900 aa (fragment). 


13..912 
1..900 


898/900(99%) j 
898/900(99%) 


0.0 


Q96S28 


AXIN - Homo sapiens (Human), 862 
aa. 


50..912 
1..862 


858/863 (99%) ! 
858/863 (99%) j 


0.0 


035625 


Axin 1 (Axis inhibition protein 1) 
(Fused protein) - Mus musculus 
(Mouse), 992 aa (fragment). 


6..912 
84..992 


781/914(85%) 
820/914(89%) ; 


0.0 


070239 


Axin 1 protein (Axis inhibition 
protein 1) (rAxin) - Rattus 
norvegicus (Rat), 893 aa (fragment). 


6..912 
21..893 


756/914(82%) i 
793/914(86%) I 


0.0 


T08422 


negative regualtor axin [imported] - 
rat, 832 aa. 


46..912 
2..832 


726/872(83%) \ 
760/872 (86%) 


0.0 



PFam analysis predicts that the NOV94a protein contains the domains shown in the 
Table 94E. 



Table 94E. Domain Analysis of NOV94a 


Pfam Domain 


NOV94a Match Region 




Expect Value 
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Similarities 
for the Matched Region 




RGS: domain 1 of 2 


137..198 


23/75 (31%) 
44/75 (59%) 


5.6e-06 


RGS: domain 2 of 2 


231. .260 


13/30(43%) 
21/30(70%) 


0.12 


TP2: domain 1 of 1 


585..709 


33/147 (22%) 
52/147 (35%) 


9.6 


DIX: domain 1 of 1 


830..912 


40/86 (47%) 
83/86 (97%) 


5.6e-44 



Example 95. 

The NOV95 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 95A. 



Table 95A. NOV95 Sequence Analysis 




SEQIDNO: 271 


2223 bp 


NOV95a, 

ppcmc^ ai r\\T a o „ _ _ 

CG59756-01 DNA Sequence 


TTGCAGGCATCACCCACGCCCTCTGCACCCACGCTGGAGGACGGGGAGGTTGTCAGGG 


GCTATGATCAGATOAGTGGGGGCCGCTTCGACTTTGATGATGGAGGGGCGTACTGCGG 
GGGCTGGGAGGGGGGAAAGGCCCATGGGCATGGACTGTGCACAGGCCCCAAGGGCCAG 
GGCGAATACTCTGGCTCCTGGAACTTTGGCTTTGAGGTGGCAGGTGTCTACACCTGGC 
CCAGCGGAAACACCTTTGAGGGATACTGGAGCCAGGGCAAACGGCATGGGCTGGGCAT 
AGAGACCAAGGGGCGCTGGCTCTACAAGGGCGAGTGGACACATGGCTTCAAGGGACGC 
TACGGAATCCGGCAGAGCTCAAGCAGCGGTGCCAAGTATGAGGGCACCTGGAACAATG 
GCCTGCAAGACGGCTATGGCACCGAGACCTATGCTGATGGAGGGACGTACCAAGGCCA 
GTTCAC CAACGG CATG CG CCATX3GCTACGG AGTACG CCAG AGCGTGCCCTACGGG ATG 
GCCGTGGTGGTGCGCT CG CCGCTGCG CACGTCGCTGTCGT CCCTGCGCAGCGAG CACA 
GCAACGGCACGGTGGCCCCGGACTCTCCCGCCTCGCCGGCCTCCGACGGCCCCGCGCT 
GCCCT CG CCCGCCATC CCGCGTGGCGGCTT CG CGCTCAGC CTCCTGGCCAATGCCGAG 
GCGGCCGCGCGGGCGCCCAAGGGCGGCGGCCTCTTCCAGCGGGGCGCGCTGCTGGGCA 
AGCTGCGGCGCGCAGAGTCGCGCACGTCCGTGGGTAGCCAGCGCAGCCGTGTCAGCTT 
CCTTAAGAGCGACCTCAGCTCGGGCGCCAGCGACGCCGCGTCCACCGCCAGCCTGGGA 
GAGGCCGCCGAGGGCGCCGACGAGGCCG CACCCTTCGAGGC CGATATCGACGCCAC CA 
CCACCGAGACCTACATGGGCGAGTGGAAGAACGACAAACGCTCGGGCTTCGGCGTGAG 
CGAACGCTCCAGTGGCCTCCGCTACGAGGGCGAGTGGCTGGACAACCTGCGCCACGGC 
TATGGCTGCACCACGCTG CCCGACGG CCACCGCGAGGAGGGCAAGT ACCG CCACAACG 
TGCTGGT CAAGG ACAC CAAG CGC CG CATG CTGC AG CT CAAG AGC AACAAGGT CCGC CA 
GAAAGTGGAGCACAGTGTGG AGGGTG CCCAGCG CGCCGCTGCTATCGCGCGCCAGAAG 
GCCG AGATTGCCGCCTCCAGGACAAG CCACGCCAAGG CCAAAGCTGAGGCAGCGGAAC 
AGGCCGCCCTGG CTGCCAACCAGG AGTCCAACATTGCTCGC ACTTTGGCCAGGGAG CT 
GGCTCCGG ACTTCTACCAGCCAGGTC CGGAATATCAGAAG CGCCGG CTGCTGCAGG AG 
ATCCTGGAGAACTCGGAGAG CCTG CTGGAG CCCCCCGACCGGGGCGCCGGCGCAGCGG 
GCCTCCCACAGCCGCCCCGCGAGAGCCCGCAGCTGCACGAGCGTGAGACCCCTCGGCC 
CGAGGGTGGCTCCCCGTCACCGGCCGGGACGCCCCCGCAGCCCAAGCGGCCCAGGCCC 
GGGGTGTCCAAGG ACGGCCTGCTG AG CCCAGGCGCCTGG AACGGCGAG CCCAGCGGTG 
AGGGCAGCCGGTCAGTCACTCCGTCCGAGGGCGCGGGCCGCCGCAGCCCCGCGCGTCC 
AGCCACCGAGCGCATGGCCATCGAGGCTCTGCAGGCACCGCCTGCGCCGTCGCGGGAG 
CCGGAGGTGGCGCTTTACCAGGGCTACCACAGCTATGCTGTGCGCACCACGCCGCCCG 
AGCCCCCACCCTTTGAGGACCAGCCCGAGCCCGAGGTCTCCGGGTCCGAGTCCGCGCC 
CTCGTCCCCGGCCACCGCCCCGCTGCAGGCCCCCAaSCTCCGAGGCCCCGAGCCTGCA 
CGCGAGACCCCCGCCAAGCTGGAGCCCAAGCCCATCATCCCCAAAGCCGAGCCCAGGG 
CCAAGGCCCGCAAGACTG AGGC TCGAGGG CTG ACCAAGG CGGGGGCCAAG AAGAAGGC 
GCGGAAGGAGGCCGCACTGGCGGCAGAGGCGGAGGTGGAGGTGGAAGAGGTCCCCAAC 
ACCATCCTCATCTGCATGGTGATCCTGCTGAACATCGGCCTGGCCATCCTCTTTGTTC 
ACCTCCTG ACCTGACCGTCGCTTACCAGGTGCAGC CAG CTGG CTGGAGGAGGGGTTGG 


GGGGCAGGAGCCCCTGGGG 




ORF Start: ATG at 70 


ORF Stop: TGA at 2158 




SEQ ID NO: 272 


696 aa MW at 74220.7kD 



374 



WO 02/072757 



PC17US02/06908 



NOV95a, 

CG59756-01 Protein Sequence 



MSGGRFDFDDGGAYCX3GWEGGKAHGHGLCTGPKGQGEYSGSWNFGFEVAGVYTWPSGN 
TFEGYWSQGKRHGLGIETKGRWLYKGEWTHGFKGRYGIRQSSSSGAKYEGTWNNGLQD 
GYGTETYADGGTYQGQFTNGMRHGYGVRQSVPYGMAVVVRSPLRTSLSSLRSEHSNGT 
VAPDSPASPASDGPALPSPAIPRGGFALSLLANAEAAARAPKGGGLFQRGALLGKLRR 
AE S RTS VG SQRS R VS FLKSD LS S GAS DAASTAS LGEAAEG AD E AAP FEAD I D ATTTET 
YMGEWKNDKRSG FG VS ERS S GLR YEG EWLDNLRHG YGCTT LPDGHRE EG KYRHNVLVK 
DTKRRMLQLKSNKVRQKVEHSVEGAQRAAAIARQKAEIAASRTSHAKAKAEAAEQAAL 
AANQESNIARTLARELAPDFYQPGPEYQKRRLLQEILENSBSLLEPPDRGAGAAGLPQ 
PPRESPQLHERETPRPEGGSPSPAGTPPQPKRPRPGVSKDGLLSPGAWNGEPSGEGSR 
S VTPSEGAGRRS PARPATERMAI EALQAPPAPSRE PE VALYQG YHS YAVRTT PPE P PP 
FEDQPEPEVSGSESAPSSPATAPLQAPTLRGPEPARETPAKLEPKPI I PKAEPRAKAR 
KTEARGLTKAGAKKKARKEAALAAEAEVEVEEVPNTILI CMVI LLNIGLAI LFVHLLT 



Further analysis of the NOV95a protein yielded the following properties shown in 
Table 95B. 



Table 95B. Protein Sequence Properties NOV95a 


PSort 
analysis: 


0.8000 probability located in nucleus; 0.7000 probability located in plasma 
membrane; 0.3133 probability located in microbody (peroxisome); 0.2000 
probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV95a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 95C. 



Table 95C. Geneseq Results for NOV95a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV95a j 
Residues/ 
Match 
Residues I 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM79123 ; 


Human protein SEQ ID NO 1785 - 
Homo sapiens, 628 aa. 
[WO200157190-A2, 09-AUG-2001] 


3..696 I 
4..628 i 


293/704(41%) ; 
377/704(52%) \ 


e-127 


AAM80107 


Human protein SEQ ID NO 3753 - 
Homo sapiens, 378 aa. 
[WO200157190-A2, 09-AUG-2001] 


283..696 
24..378 j 


146/421 (34%) ; 
194/421 (45%) 


2e-43 


ABB21683 


Protein #3682 encoded by probe for 
measuring heart cell gene expression - 
Homo sapiens, 135 aa. 
[WO200157274-A2, 09-AUG-2001] 


257..389 
6..135 


78/133 (58%) ; 
104/133(77%) i 


7e-42 


AAM57089 


Human brain expressed single exon 
probe encoded protein SEQ ID NO: 
29194 - Homo sapiens, 135 aa. 
[WO200157275-A2, 09-AUG-2001] 


257..389 
6..135 


78/133(58%) : 
104/133(77%) j 


7e-42 
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AAM17323 


Peptide #3757 encoded by probe for 
measuring cervical gene expression - 
Homo sapiens, 135 aa. 
[WO200157278-A2, 09-AUG-2001] 


257..389 ! 
6..135 i 


78/133(58%) ; 
104/133(77%) j 


7e-42 


In a BLAST search of public sequence databases, the NOV95a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 95D. 


Table 95D. Public BLASTP Results for NOV95a 


Protein 
Accession 
Number 


rTOiein/urganism/Lengin 


NOV95a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9GKY7 


JUNCTOPHILIN TYPE 2 - Oryctolagus 
cuniculus (Rabbit), 694 aa. 


1..696 
1..694 


644/701 (91%) 
662/701 (93%) 


0.0 


Q9ET79 


JUNCTOPHILIN TYPE 2 - Mus 
musculus (Mouse), 696 aa. 


1..696 
1..696 


608/706 (86%) 
644/706(91%) 


0.0 


Q9BR39 


DJ1 108D1 1.1 (NOVEL PROTEIN 
SIMILAR TO C. ELEGANS T22C1 .7 ) - 
Homo sapiens (Human), 552 aa 
(fragment). 


128..672 
1..545 


544/545 (99%) j 
544/545 (99%) j 


0.0 


Q9GKY8 


MITSUGUMIN72/JUNCTOPHILIN 
TYPE1 - Oryctolagus cuniculus (Rabbit), 
662 aa. 


1..696 
1..662 


364/704(51%) 
468/704 (65%) j 


0.0 


Q9ET80 


JUNCTOPHILIN TYPE 1 - Mus 
musculus (Mouse), 660 aa. 


1..696 
1..660 


371/707(52%) 
469/707 (65%) 


0.0 



PFam analysis predicts that the NOV95a protein contains the domains shown in the 
Table 95E. 
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Table 95E. Domain Analysis of NOV95a 


Pfam Domain 

M. 1 (till 1/U1110IU 


NOV95a Match Reeion 


Identities/ 
Similarities 
for the Matched Region 


PvnAct Value 


MORN* domain 1 of 7 


14..36 


10/23 (43%) 
13/23 (57%) 


1.1 


MORN* domain 2 of 7 


38. .59 


9/23 (39%) 
15/23 (65%) 


0.31 


MORN' domain 1 of 7 « 


60.. 77 


8/23 (35%) s 
15/23 (65%) 


3 


MORN' domain 4 of 7 ! 


106 128 


1 1/23 C48%i 
20/23(87%) 


3.7e-06 


MORN: domain 5 of 7 


129..151 


8/23 (35%) 
15/23 (65%) 


0.027 


MORN: domain 6 of 7 


291. .3 13 


12/23 (52%) 
19/23 (83%) 


0.00056 


MORN: domain 7 of 7 


314..336 


11/23(48%) 
19/23 (83%) 


0.00022 . 



Example 96. 

The NOV96 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 96A. 



Table 96A. NOV96 Sequence Analysis 




SEQIDNO:273 


3257 bp 


NOV96a, 

CG59708-01 DNA Sequence 


CGTAGGCGCTTCGGCCATOACTGCGGAGCTGCAGCAGGACGACGCGGCCGGCGCGGCA 

GACGGCCACGGCTCGAGCTGCCAAATGCTGTTAAATCAACTGAGAGAAATCACAGGCA 

TT CAGG AC C CTTC CTTTC TCCATG AAG CTCTG AAGG CCAG T AATGG TGAC ATT ACT CA 

GGCAGTCAGCCTTCTCACTGATGAGAGAGTTAAGGAGCCCAGTCAAGACACTGTTGCT 

ACAGAACCATCTGAAGTAGAGGGGAGTGCTGCCAACAAGGAAGTATTAGCAAAAGTTA 

TAGACCTTACTCATGATAACAAAGATGATCTTCAGGCTGCCATTGCTTTGAGTCTACT 

GGAGTCTCCCAAAATTCAAGCTGATGGAAGAGATCTTAACAGGATGCATGAAGCAACC 

TCTGCAGAAACTAAACGCTCAAAGAGAAAACGCTGTGAAGTCTGGGGAGAAAACCCCA 

ATCCCAATGACTGGAGGAGAGTTGATGGTTGGCCAGTTGGGCTGAAAAATGTTGGC^A 

TACATGTTC4GTTTAGTGCTGTTATTCAGTCTCTCTTTCAATTGCCTGAATTTCGAA 

CTTGTTCTCAGTTATAGTCTGCCACAAAATGTACTTGAAAATTGTCGAAGTCATACAG 

AAAAG AG AAAT ATC AT GTTT ATG CAAG AGCTT CAGT ATTTGTTTG CT C T AATG ATG GG 

ATCAAATAGAAAATTTGTAGACCCGTCTCCAGCCCTGGATCTATTAAAGGGAGCATTC 

CG ATCATCTG AGGAACAG CAGC AAG ATGTG AGTG AATT C AC A CAC AAG CTC CTG G ATT 

GG CTAGAGG ACGCATTCCAG CT AGCTGTTAATGTTAACAGTCCCAGGAACAAATCTGA 

AAATCCAATCGTGCAGCTGTTCTATGGTACTTTCCTGACTGAAGGGGTTCGTGAAGGA 

AAACCCTTTTGTAACAATGAGACCTTCWCCAGTATCCTCTTCAGGTAAAaSGTTATC 

GCAACTTAGACGAGTGTTTGGAAGGGGCCATCGTGGAGGGTGATCTTGAGCTTCTTCC 

CTCCGATCACTCGGTGAAGTATX5GACAAGAGCGTTGGTTTACAAAGCTACCTCCAGTG 

TTGACCTTTGAACTCTCAAGATTTGAGTTTAATCAGTCCCTTGGGCAGCCAGAGAAAA 

TTCACAATAAGCTGGAATTTCCTCAGATTATTTATATGGACAGGTACATGTACAGGAG 

CAAGGAGCTTATTCGAAATAAGAGAGAGTGTATTCGAAAGTTGAAGGAGGAAATAAAA 

ATT CTG C AG CAAAAATTG G AAAGG T ATG TG AAAT ATGG CT CAGG CC CAG CT CGG TT CC 

CGCTCCCGGACATGCTGAAATATGTTATTGAATTTGCTAGTACAAAACCTGCCTCAGA 

AAGCTGTCCACCTGAAAGTGAC^CACATATGACATTACCACTTTCTTCAGTGCACTGC 

TCGGTTTCTGACC^GACATCCAAGGAAAGTACAAGTACAGAAAGCTCTTCTCAGGATG 

TTGAAAGT ACCTTTTCTT CTCCTGAAG ATT CTTT ACCCAAGTCTAAACCACTGACATC 
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TTCTCGGTCTTCCATGG AAATG CCTTCACAG CCAG CTCCACG AACAGTC ACAGATGAG 
GAGATAAATTTTGTTAAGACCTGTCTTCAGAGATGGAGGAGTGAGATTGAACAAGATA 
TACAAGATTTAAAGACTTGTATTGCAAGTACTACTCAGACTATTGAACAGATGTACTG 
CGATCCTCTC CTTCGTCAGGTG CCTTATCG CTTGCATGCAGTTCTTGTTCATGAAGGA 
CAAGCAAATGCTGGACACTATTGGGCCTATATCTATAATCAACCCCGACAGAGCTGGC 
TCAAGTACAATGACATCTCTGTTACTGAATCTTCCTGGGAAGAAGTTGAAAGAGATTC 
CTATGGAGGCCTGAGAAATGTTAGTGCTTACTGTCTGATGTACATTAATGCCAAACTA 
CC CT ACTTCAATG C AG AGGCAGCC C C AACTG AATCAG ATCAAATGTCAG AAGTGGAAG 
CCCTATCTGTGGAACTCAAGCATTACATTCAGGAGGATAACTGGCGGTTTGAGCAGGA 
AGTAG AGGAGTGGGAAGAAGAG CAGTCTTG C AAAATCCCTC AAATGG AGTCCTCCCCC 
AACTCCT CATCACAGGGCTACTCTACATCACAAGAG CCTT C AGTAGCCTCTTCTCATG 
GGGTTCGCTGCTTGTCATCTGAGCATGCTGTGATTGTAAAGGAGCAAACTGCCCAGGC 
TATTGCAAACACAGCCCGTGCC T ATGAGAAGAG CGGTGTAGAAGCGGCACTGAGTGAG 
GCATTCCATGAAGAATACTCCAGGCTCTATCAGCTTGCCAAAGAGACCCCCACCTCTC 
ACAGTGATCCTCGACTTCAGCATGTCCTTGTCTACTTTTTCCAAAATGAAGCACCCAA 
AAGGGTAGT AGAACGAAC CCTTCTGG AACAGTTTGCAGATAAAAATCTTAG CTATG AT 
GAAAGATCAATCAGCATTATGAAGGTGGCTCAAGCGAAACTGAAGGAAATTGGTCCAG 
ATGACATGAATATGGAAGAGTACAAGAGGTGGCATGAAGATTATAGTTTGTTCCGAAA 
AGTGTCTGTGTATCTCCTAACAGGCCTAGAACTCTATCAAAAAGGAAAGTACCAAGAG 
GCACTTTCCTACCTGGTAT ATG CCTAC CAGAGCAATG CTGCCCTGCTGATGAAGGGGC 
CCCGCCGGGGGGTCAAAGAATCCGTGATTGCTTTATACCGAAGAAAATGCCTTCTGGA 
GCTGAATGCCAAAGCAG CTTCT CTTTTTGAAAC AAATG ATG ATCACT CCGTAACTG AG 
GGCATT AATGTGATG AATGAACTGATC AT C C C CTGCATTCACCTT AT CATTAATAATG 
ACATTTCCAAGGATGATCTGGATGCCATTGAGGTCATGAGAAACCATTGGTGCTCTTA 
CCTTGGG CAAGATATTGCAGAAAATCTG CAGCTGTGCCTAGGGG AGTTTCTACCCAG A 
CTTCTAGATCCTTCTGCAGAAATCATCGTCTTGAAAGAGCCTCCAACTATTCGACCCA 
ATTCTCCCTATGACCTATGTAGCCGATTTGCAGCTGTCATGGAGTCAATTCAGGGAGT 
TTCAACTGTGACAGTGAAATAAGCTCCCACATGTTCAAGGCCCATTCTGGTTCCTGGC 
TGCCTGCCTCTTGCACAGAAGTTCGTTGTCATAGTGCTCACCTTGGGAAAAGGATTAG 


GTGGGCACA 




ORF Start: ATG at 17 


ORF Stop:TAA at 3152 




SEQ ID NO: 274 


1045 aa MW at 1 19041. 7kD 


NOV96a, 

CG59708-01 Protein Sequence 


MTAELQQDDAAGAADGHGS SCQMLLNQLRE I TG I QDPS FLHEALKASNGD I TQAVSLL 
TDERVKEPSQDTVATEPSEVEGSAANKEVLAKVIDLTHDNKDDLQAAIALSLLESPKI 
QADGRDLNRMHEATSAETKRSKRKRCEVWGENPNPNDWRRVDGWPVGLKNVGNTCWFS 
AVIQSLFQLPEFRRLVLSYSLPQNVLENCRSHTEKRNIMFMQELQYLFALMMGSNRKF 
VDPSAALDLLKGAFRSS EEQQQD VS EFTHKLLDWLEDAFQLAVNVNS PRNKSENPMVQ 
LFYGTFLTEGVREGKPFCNNETFGQYPLQVNGYRNLDECLEGAMVEGDVELLPSDHSV 
KYGQERWFTKLPPVLTFELSRFEFNQSLGQPEKIHNKLEFPQIIYMDRYMYRSKELIR 
NKRECIRKLKEEI KI LQQKIiERYVKYGSGPARFPLPDMLKYVIEFASTKPASESCPPE 
SDTHMTLPLSSVHCSVSDQTSKESTSTESSSQDVESTFSSPEDSLPKSKPLTSSRSSM 
EMPSQ PAPRTVTDEE INFVKTCLQRWRS E I EQD I QDLKTC I ASTTQTI EQMYCDPLLR 
QVP YRLHAVLVHEGQANAGHYWAY I YNQPRQS WLKYND ISVTESS WE EVERDS YGGLR 
NVSAYCLMYINAKLPYFNAEAAPTESDQMSEVFJUjSVELKHYIQEDNWRFEQEVEEWE 
EEQSCKI PQMESS PNSSSQGYSTSQEPSVASSHGVRCLSSEHAVI VKEQTAQAI ANTA 
RAYEKSGVEAALSEAFHEEYSRLYOLAKETPTSHSDPRIiQHVLVYFFQNEAPKRWER 
TLLEQFADKNLS YDERS IS I MKVAQAKLKE IGPDDMNMEEYKRWHEDYSLFRKVS VYL 
LTGLELYQKGKYQEALSYLVYAYQSNAALLMKGPRRGVKESVIALYRRKCLLELNAKA 
ASLFETNDDHSVTEGINVMNELI IPCIHLI INNDI SKDDLDAI EVMRNHWCSYLGQDI 
AENLQLCLGEFLPRLLDPSAEIIVLKEPPTIRPNSPYDLCSRFAAVMESIQGVSTVTV 
K 




SEQ ID NO: 275 


3044 bp 


NOV96b, 

CG59708-02 DNA Sequence 


CGTAGGCGCTTCGGCCATGACTGCGGAGCTGCAGCAGGACGACGCGGCCGGCGCGGCA 
GACGGCCACGGCTCGAGCTGCCAAATGCTGTTAAATCAACTGAGAGAAATCACAGGCA 
TTCAGGACCCTTCCTTTCTCCATGAAGCTCTGAAGGCCAGTAATGGTGACATTACTCA 
GG CAGTCAGCCTTCTCACTG ATGAG AG AGTTAAGG AGCCCAGTC AAGACACTGTTG CT 
ACAGAACCATCTGAAGTAGAGGGGAGTGCTGCCAACAAGGAAGTATTAGCAAAAGTTA 
TAGACCTTACTCATGATAACAAAGATGATCTTCAGGCTGCCATTGCTTTGAGTCTACT 
GG AGTCTCCCAAAATTCAAGCTG ATGG AAG AG ATCTT AACAGG ATG CATGAAG CAACC 
TCTGCAGAAACTAAACG CTCAAAG AGAAATATCATGTTTATGCAAG AG CTTCAGTATT 
TGTTTG CTCT AATG ATGGG AT CAAAT AG AAAATTTGT AG AC C CGT CTG CAG CC CTGG A 
TCT ATT AAAGGG AG C ATTC CG AT CATCTG AG G AACAG CAG C AAG ATG TG AGTG AATT C 
ACACACAAGCTCCTGGATTGG CTAG AGGACGCATTCCAG CTAGCTGTT AATGTTAACA 
GTCCCAGGAACAAATCTGAAAATCCAATGX3TGCAGCrGTTCTATGGTACTTTCCTGAC 
TG AAGGGGTTCGTGAAGGAAAACCCTTTTGTAACAATGAGACCTTCGGC CAGTAT CCT 
CTTCAGGTAAACGGTTATCGCAACTTAGACGAGTGTTTGGAAGGGGCCATGGTGGAGG 
GTGATGTTGAGCTTCTTCCCTCCGATCACTCGGTGAAGTATGGACAAGAGCGTTGGTT 
TACAAAGCTACCTCCAGTGTTGACCTTTGAACTCTCAAGATTTGAGTTTAATCAGTCC 
CTTGGG CAG CCAG AG AAAATTCACAAT AAG CTGG AATTTCCTCAG ATT ATTT AT ATGG 
ACAGGTACATGTACAGGAGCAAGGAGCTTATTCGAAATAAGAGAGAGTGTATTCGAAA 
GTTGAAGGAGGAAATAAAAATTCTGCAGCAAAAATTGGAAAGGTATGTGAAATATGGC 
TCAGGCCCAGCTCGGTTCCCGCTCC CGG ACATG CTG AAATATGTTATTGAA1TTGCT A 
GTACAAAACCTGCCTCAGAAAGCTGTCCACCTGAAAGTGACACACATATGACATTACC 
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ACTTTCTTCAGTGCACTGCTCGGTTTCTGACCAGACATCCAAGGAAAGTACAAGTACA 
GAAAGCTCTTCTCAGGATGTTGAAAGTACCTTTTCTTCTCCTGAAGATTCTTTACCCA 
AGTCTAAACCACTGACATCTTCTCGGTCTTCCATGGAAATGCCTTCACAGCCAGCTCC 
ACGAACAGTCACAGATGAGGAGATAAATTTTGTTAAGACCTGTCTTCAGAGATGGAGG 
AGTGAGATTGAACAAGATATACAAGATTTAAAGACTTGTATTGCAAGTACTACTCAGA 
CTATTGAACAGATGTACTGCGATCCTCTCCTTCGTCAGGTGCCTTATCGCTTGCATGC 
AGTTCTTGTTCATGAAGGACAAGCAAATGCTGGACACTATTGGGCCTATATCTATAAT 
CAACCCCGACAGAGCTGGCTCAAGTACAATGACATCTCTGTTACTGAATCTTCCTGGG 
AAGAAGTTGAAAGAGATTCCTATGGAGGCCTGAGAAATGTTAGTGCTTACTGTCTGAT 
GTACATTAATGCCAAACTACCCTACTTCAATGCAGAGGCAGCCCCAACTGAATCAGAT 
CAAATGTCAGAAGTGGAAGCCCTATCTGTGGAACTCAAGCATTACATTCAGGAGGATA 
ACTGGCGGTTTGAGCAGGAAGTAGAGGAGTGGGAAGAAGAGCAGTCTTGCAAAATCCC 
TCAAATGGAGTCCTCCCCCAACTCCTCATCACAGGGCTACTCTACATCACAAGAGCCT 
TCAGTAGCCTCTTCTCATGGGGTTCGCTGCTTGTCATCTGAGCATGCTGTGATTGTAA 
AGGAGCAAACTGCCCAGGCTATTGCAAACACAGCCCGTGCCTATGAGAAGAGCGGTGT 
AG AAGCGG CACTG AGTGAGGCATTCCATGAAGAATACTCCAGG CTCTAT CAGCTTG CC 
AAAGAGACCCCCACCTCTCACAGTGATCCTCGACTTCAGCATGTCCTTGTCTACTTTT 
TCCAAAATGAAGCACCCAAAAGGGTAGTAGAACGAACCCTTCTGGAACAGTTTGCAGA 
T AAAAATCTTAGCTATG ATGAAAG ATCAAT CAG CATTATGAAGGTGGCTCAAG CGAAA 
CTGAAGGAAATTGGTCCAGATGACATGAATATGGAAGAGTACAAGAGGTGGCATGAAG 
ATT ATAGTTTGTTC CGAAAAGTGTCTGTGT AT CTCCTAACAGGCCT AGAACTCT ATC A 
AAAAGGAAAGTACCAAGAGGCACTTTCCTACCTGGTATATGCCTACCAGAGCAATGCT 
GCCCTGCTGATGAAGGGGCCCCGCCGGGGGGTCAAAGAATCCGTGATTGCTTTATACC 
GAAGAAAATGCCTTCTGGAGCTGAATGCCAAAGCAGCTTCTCTTTTTGAAACAAATGA 
TGATCACTCCGTAACTGAGGG CATTAATGTG ATGAATG AACTGATCATC CCCTGCATT 
CACCTTATCATTAATAATGAC ATTTCCAAGG ATGATCTGG ATG C CATTG AGGT CATG A 
GAAACCATTGGTGCTCTTACCTTGGGCAAGATATTGCAGAAAATCTGCAGCTGTGCCT 
AGGGGAGTTTCT ACC CAGACTTCT AGATCCTTCTGCAG AAATCAT CGTCTTGAAAG AG 
C CT CC AACT ATT CG ACCCAATT CT C C C T ATG AC CT ATG TAG C CG ATTTGCAG CTGT CA 
TGGAGTCAATTCAGGGAGTTTCAACTGTGACAGTGAAATAAGCTCCCACATGTTCAAG 
GCCCATTCTGGTTCCTGGCTGCCTGCCTCTTGCACAGAAGTTCGTTGTCATAGTGCTC 


ACCTTGGGAAAAGGATTAGGTGGGCACA 




ORF Start: ATG at 17 


ORF Stop: TAA at 2939 




SEQ ID NO: 276 


974 aa MW at 1 1 0687.3kD 


NOV96b, 

CG59708-02 Protein Sequence 


MTAELQQDDAAGAADGHGS SCQMLLNQLRE ITGIQDPS FLHEALKASNGDI TQAVSLL 
TD ER VKE PS QDT VAT E PS E V EG S AANKE VLAKV I D LTHDN KDDLQAA I ALS LLE S P K I 
QADGRDLNRMHEATSAETKRSKRNIMFMQELQYLFALMMGSffRKFVDPSAALDLLKGA 
FRSSEEQQQDVSEFTHKLLDWLEDAFQLAVNVNSPRNKSENPMVQLFYGTFLTEGVRE 
GKPFCNNETFGQYPLQVNGYRNLDECLEGAMVEGDVELLPSDHSVKYGQERWFTKLPP 
VLT F E LSR F E FNQS LGQ PE K I HN KLE F PQ 1 1 YMDR YMYRS KE LI RNKREC I RKLKE E I 
KI LQQKLERYVKYGSGPARFPLPDMLKYVI EFASTKPAS E SCPPESDTHMTLPLS SVH 
CSVSDQTSKESTSTESSSQDVESTFSSPEDSLPKSKPLTSSRSSMEMPSQPAPRTVTD 
EE INFVKTCLQRWRSEI EQD I QDLKTC I ASTTQTI EQMYCDPLLRQVPYRLHAVLVHE 
GQANAGHYWAYIYNQPRQSWIiKYNDISVTESSWEEVERDSYGGLRNVSAYCLMYINAK 
LPYFNAEAAPTESDQMSEVEALSVELKHYIQEDNWRFEQEVEEWEEEQSCKIPQMESS 
PNS S SQGYSTS QE PS VAS S HG VRCL S S EHA VI VKEQT AQAI ANT ARAYE KSGVEAALS 
EAFH E E YS RLYQLAKET PT S HSD PRLQHVL VY FFQNE AP KRWE RTLLEQFAD KNL S Y 
DERSISIMKVAQAKLKEIGPDDMNMEEYKRWHEDYSLFRKVSVYLLTGLELYQKGKYQ 
EALSYIiVYAYQSNAALLMKGPRRGVKESVIALYRRKCLLELNAKAASLFETNDDHSVT 
EGINVMNELI I PCIHLI INNDISKDDLDAIEVMRNHWCS YLGQDI AENLQLCLGEFLP 
RLLDPSAEIIVLKEPPTIRPNSPYDLCSRFAAVMESIQGVSTVTVK 




SEQ ID NO: 277 


3231 bp 


NOV96c, 

CG59708-03 DNA Sequence 


GCGCTTCGGCCATOACTGCGGAGCTGCAGCAGGACGACGCGGCCGGCGCGGCAGACGG 
C C ACGGCTCGAGCTGCCAAATG CTGTTAAATCAACTG AGAG AAATCACAGG CATTCAG 
G ACCCTTCCTTTCTCCATGAAG CTCTGAGGG CCAGTAATGGTG ACATTACTC AGGCAG 
TCAGCCTTCTCACTGATGAGAGAGTTAAGGAGCCCAGTCAAGACACTGTTGCTACAGA 
ACCATCTGAAGTAGAGGGGAGTGCTGCCAACAAGGAAGTATTAGCAAAAGTTATAGAC 
CTTACTCATG AT AACAAAGATGATCTTCAGG CTGC CATTGCTTTG AGTCTACTGGAGT 
CTCCCAAAATTCAAGCTGATGGAAGAGATCTTAACAGGATGCATGAAGCAACCTCTGC 
AGAAACrAAACGCTCAAAGAGAAAACGCTGTGAAGTCTGGGGAGAAAACCCCAATCCC 
AATGACTGGAGGAGAGTTGATGGTTGGCCAGTTGGGCTGAAAAATGTTGGCAATACAT 
GTTGGTTTAGTGCTGTTATTCAGTCTCTCTTTCAATTGCCTGAATTTCGAAGACTTGT 
TCTCAGTTATAGTCTGCCACAAAATGTACTTGAAAATTGTCGAAGTCATACAGAAAAG 
AGAAATATCATGTTTATGCAAGAGCTTCAGTATTTGTTTGCTCTAATGATGGGATCAA 
ATAGAAAATTTGTAGACCCGTCTGCAGCCCTGGATCTATTAAAGGGAGCATTCCGATC 
ATCTGAGG AAC AGCAG C AAG ATGTG AGTG AATTCACACAC AAGCTC CTGGATTGG CTA 
GAGGACGCATTCCAGCTAGCTGTTAATGTTAACAGTCCCAGGAACAAATTTGAAAATC 
CAATGGTGCAGCTGTTCTATGGTACTTTCCTGACTGAAGGGGTTCGTGAAGGAAAACC 
CTTTTGT AACAATGAGACCTTCGG CCAGTAT CCTCTTCAGGT AAACGGTT ATCGCAAC 
TTAGACGAGTGTTTGGAAGGGGCCATGGTGGAGGGTGATGTTGAGCTTCTTCCCTCCG 
ATCACTCGGTGAAGTATGGACAAGAGCGTTGGTTTACAAAGCTACCTCCAGTGTTGAC 
CTTTGAACTCTCAAGATTTGAGTTTAATCAGTCCCTTGGGCAGCCAGAGAAAATTCAC 
AATAAG CTGG AATTTCCTCAG ATT ATTT AT ATGGACAGGT ACATGT ACAGGAG CAAGG 
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AGCTTATTCGAAATAAGAGAGAGTGTATTCGAAAGTTGAAGGAGGAAATAAAAATTCT 
GCAGCAAAAATTGGAAGGGTATGTGAAATATGGCTCAGGCCCAGCTCGGTTCCCGCTC 
CCGGACATG CTG AAATATGTTATTG AATTTGCT AGTACAAAACCTG CCTCAGAAAG CT 
GTCCACCTGAAAGTG ACACACATATG ACATTAC CACTTTCTTCAGTGCACTG CTCGGT 
TTCTAACCAGACATCCAAGGAAAGTACAAGTACAGAAAGCTCTTCTCAGGATGTTGAA 
AGTACCTTTTCTTCTCCTGAAGATTCTTTACCCAAGTCTAAACCACTGACATCTTCTC 
GGTCTTCCATGGAAATGCCTTCACAGCCAGCTCCACGAACAGTCACAGATGAGGAGAT 
AAATTTTGTTAAGACCTGTCTTCAGAGATGGAGGAGTGAGATTGAACAAGATATACAA 
GATTT AAAG ACTTGTATTGCAAGT ACTACT CAGACTATTG AACAGATGTACTG CGATC 
CTCTCCTTCGTCAGGTGCCTTATCGCTTGCATGCAGTTCTTGTTCATGAAGGACAAGC 
AAATGCTGGACACTATTGGGCCTATATCTATAATCAACCCCGACAGAGCTGGCTCAAG 
TACAATGACATCTCTGTTACTGAATCTTCCTGGGAAGAAGTTGAAAGAGATTCCTATG 
GAGGCCTGAGAAATGTTAGTGCTTACTGTCTGATGTACATTAACGACAAACTACCCTA 
CTTCAATGCAGAGGCAGCCCCAACTGAATCAGATCAAATGTCAGAAGTGGAAGCCCTA 
TCTGTGGAACTCAAGCATTACATTCAGGAGGATAACTGGCGGTTTGAGCAGGAAGTAG 
AGGAGTGGGAAGAAGAGCAGTCTTGCAAAATCCCTCAAATGGAGTCCTCCACCAACTC 
CTCATCACAGGACTACTCTACATCACAAGAGCCTTCAGTAGCCTCTTCTCATGGGGTT 
CGCTGCTTGTCATCTGAGCATGCTGTGATTGTAAAGGAGCAAACTGCCCAGGCTATTG 
CAAACACAGCCCGTGCCTATGAGAAGAGCGGTGTAGAAGCGGCACTGAGTGAGGCATT 
CCATGAAGAATACTCCAGG CTCTATCAGCTTG CC AAAG AGACC CCCACCTCTCACAGT 
GATCCTCGACTTCAGCATGTCCTTGTCTACTTTTTCC AAAATGAAG CAC CCAAAAGGG 
TAGTAG AACGAACCCTTCTGG AACAGTTTGCAGATAAAAAT CTTAG CTATG ATG AAAG 
ATCAATCAG CATTATGAAGGTGG CTCAAGCGAAACTGAAGGAAATTGGT CCAGATG AC 
ATGAATATGGAAGAGTACAAGAAGTGGCATGAAGATTATAGTTTGTTCCGAAAAGTGT 
CTGTGTATCTCCTAACAGGCCTAGAACTCTATCAAAAAGGAAAGTACCAAGAGGCACT 
TTCCTACCTGGTATATGCCTACCAGAGCAATGCTGCCCTGCTGATGAAGGGGCCCCGC 
CGGGGGGTCAAAGAATCCGTGATTGCTTTATACCGAAGAAAATGCCTTCTGGAGCTGA 
ATGCCAAAG CAGCTTCTCTTTTTGAAACAAATG ATG ATCACTCCGTAACTG AGGG CAT 
TAATGTGATGAATGAACTGATCATCCCCTGCATTCACCTTATCATTAATAATGACATT 
TCCAAGGATGATCTGGATGCCATTGAGGTCATGAGAAACCATTGGTGCTCTTACCTTG 
GGCAAGATATTGCAGAAAATCTGCAGCTGTGCCTAGGGGAGTTTCTACCCAGACTTCT 
AGATCCTTCTGCAGAAATCATCGTCTTGAAAGAGCCTCCAACTATTCGACCCAATTCT 
CCCTATGACCTATGTAGCCGATTTGCAGCTGTCATGGAGTCAATTCAGGGAGTTTCAA 
CTGTGACAGTGAAATAAGCTCCCACATGTTCAAGGCCCATTCTGGTTCCTGGCTGCCT 
GCCTCTTGCACAGAAGTTCGTTGTCATAGTGCTCACCTTGG 




ORF Start: ATG at 12 


ORF Stop: TAAat 3147 




SEQ ID NO: 278 


1045 aa 


MW at 119107.7kD 


NOV96c> 

CG59708-03 Protein Sequence 


MTAELQQDD AAGAADGHGS SCQMLLNQLR E I TG I QD PS F LHEALRASNGD I TQAVSLL 
TDERVKEPSQDTVATEPSE VEGSAANKE VLAKVI DLTHDNKDDLQAAIALS LLES PKI 
QADGRDLNRMHEATSAETKRSKRKRCEWGENPNPNDWRRVDGWPVGLKNVGNTCWFS 
A VI QS LFQL PEFRRL VLS YSL PQNVLENCRSHTE KRN I MFMQE LQ YLFALMMG SNRKF 
VDPSAALDLIjKGAFRSSEEQQQDVSEFTHICLLDWLEDAFQLAVNVNSPRNKFENPMVQ 
LFYGTFLTEGVREGKPFCNNETFGQYPLQVNGYRNLDECLEGAMVEGDVEIiLPSDHSV 
KYGQERWFTKLPP VLTFELSRFEFNQSLGQPEKIHNKLEFPQI I YMDRYMYRS KEL I R 
NKRECIRKLKEEIKILQQKLEGYVKYGSGPARFPLPDMLKYVIEFASTKPASESCPPE 
SDTHMTLPLSSVHCSVSNQTSKESTSTESSSQDVESTFSSPEDSLPKSKPLTSSRSSM 
EMPSQPAPRTVTDEEINFVKTCLQRWRSEIEQDIQDLKTCIASTTQTIEQMYCDPLLR 
QVPYRLHAVLVHEGQANAGHYWAYI YNQPRQSWLKYND I SVTESSWEEVERDS YGGLR 
NVS A YCIjMY I NDKLP YFNAEAAPTE SDQMS E VEAL S VE L KHY I QE DNWR FEQE VEEWE 
EEQSCKIPQMESSTNSSSQDYSTSQEPSVASSHGVRCLSSEHAVIVKEQTAQAIANTA 
RA YEKSGVEAALS EA FHE E YS RLYQIiAKET PTS HS DPRLQHVLVYF FQNEAPKRWE R 
TLLEQFADKNLSYDERSISIMKVAQAKLKEIGPDDMNMEEYKKWHEDYSLFRKVSVYL 
LTGLELYQKGKYQEALSYLVYAYQSNAALLMKGPRRGVKESVIALYRRKCLLELNAKA 
ASLFET^n}DHSVTEGINVMNELIIPCIHLIINNDISKBDLDAIEVMRNHWCSYLGQOI 
AENI^LCLGEFLPRLUDPSAEIIVLKEPPTIRPNSPYDLCSRFAAVMESIC^VSTVTV 
K 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 96B. 



Table 96B. Comparison of NOV96a against NOV96b through NOV96c. 


Protein Sequence 


NOV96a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 
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NOV96b 


209.1045 ; 
138..974 j 


805/837 (96%) 
805/837 (96%) 


NOV96c 


- 1..1045 1 
1..1045 j 


979/1045 (93%) 
981/1045(93%) 



Further analysis of the NOV96a protein yielded the following properties shown in 
Table 96C. 



Table 96C. Protein Sequence Properties NOV96a 


PSort 
analysis: 


0.8800 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV96a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 96D. 



Table 96D. Geneseq Results for NOV96a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV96a 
Residues/ ; 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAE04874 


Human protease protein- 1 (PRTS-1) - 
Homo sapiens, 1055 aa. 
[WO200146443-A2, 28-JUN-2001] 


22.. 1036 
18..1047 


524/1035 (50%) 
713/1035 (68%) 


0.0 


AAB31552 


A human ubiquitin specific protease 
25 (USP25) - Homo sapiens, 1055 aa. 
[WO200079267-A2, 28-DEC-2000] 


22.. 1036 
18..1047 ( 


524/1035 (50%) 
713/1035 (68%) 


0.0 


AAB31546 


A human ubiquitin specific protease 
25 (USP25) - Homo sapiens, 1055 aa. 
[WO200078934-A2, 28-DEC-2000] 


22..1036 \ 
18..1047 i 


524/1035 (50%) 
713/1035 (68%) 


0.0 


AAB74491 


Human SYK kinase binding protein 
SYK-UBP isoform 1 - Homo sapiens, 
1055 aa. [WO200121654-A2, 29- 
MAR-2001] 


22.. 1036 
18..1047 


522/1035 (50%) 
710/1035 (68%) 


0.0 


AAB31556 


A human ubiquitin specific protease 
(USP) - Homo sapiens, 1087 aa. 
[WO200079267-A2, 28-DEC-2000] 


22..1036 : 
18.1079 


525/1067 (49%) 
717/1067 (66%) 


0.0 
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In a BLAST search of public sequence databases, the NOV96a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 96E. 



Table 96E. Public BLASTP Results for NOV96a 


Protein ; 
Accession 
Number 


Protein/Organism/Length 


NOV96a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96RU2 


UBIQUITIN SPECIFIC PROTEASE - 

tJfvt'nrt com ayic i T-Jhtyi on A 1 fl77 qq 

nomo Sapiens ^irLUjjidnj, iu / / da. 


L.1045 

1 1 077 


1041/1077 (96%) 

1 OAO/ 1 f>77 (QffiA\ 


0.0 


Q9P213 j 


KIAA1515 PROTEIN - Homo sapiens 
(Human), 757 aa (fragment). 


304.. 1045 
16..757 


738/742 (99%) 
739/742(99%) j 


0.0 


P57080 j 


Ubiquitin carboxyl-terminal hydrolase 
25 (EC 3.1.2.15) (Ubiquitin 
thiolesterase 25) (Ubiquitin-specific 
processing protease 25) 
(Deubiquitinating enzyme 25) 
(mUSP25) - Mus musculus (Mouse), 
1055 aa. 


22.. 1036 
18.. 1047 


527/1033(51%) 
710/1033(68%) 


0.0 


Q9UHP3 \ 


Ubiquitin carboxyl-terminal hydrolase 
25 (EC 3. 1.2.15) (Ubiquitin 
thiolesterase 25) (Ubiquitin-specific 
processing protease 25) 
(Deubiquitinating enzyme 25) (USP on 
chromosome 21) - Homo sapiens 
(Human), 1087 aa. 


22..1036 i 
18..1079 i 


525/1067 (49%) 
717/1067 (66%) 


0.0 


Q9H9W1 j 


CDNA FIJI 25 12 FIS, CLONE 
NT2RM2001730, WEAKLY 
SIMILAR TO PROBABLE 
UBIQUITIN CARBOXYL- 
TERMINAL HYDROLASE K02C4.3 
(EC 3.1.2.15) - Homo sapiens 
(Human), 737 aa. 


313..1036 
2..729 


363/733(49%) 1 
510/733 (69%) 


0.0 



PFam analysis predicts that the NOV96a protein contains the domains shown in the 
Table 96F. 



Table 96F. Domain Analysis of NOV96a 


Pfam Domain 


NOV96a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 
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UIM: domain 1 of 1 


96..113 


9/18 (50%) 
14/18(78%) 


8.4 


UCH-1: domain 1 of 1 


162..193 


14/32 (44%) 
28/32 (88%) 


2.6e-ll 


UCH-2: domain 1 of 1 


580..649 


26/72 (36%) 
56/72 (78%) 


1.5e-19 



Example 97. 

The NOV97 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 97A. 



Table 97A. NOV97 Sequence Analysis 




SEQ ID NO: 279 


1601 bp 


NOV97a_ 

CG59559-01 DNA Sequence 


AGGGCAGAGGCCACAGCGCCATCCCCTTCCCCATGGTCTCCCTACCCCCAACCTGCAC 


TGGGCGCTCCGCCCAGAGGTGAGTCCCTCCCAGCCCTTCTCTCCTTCTGTCCTAGCCA 


TCCGCAGAGCCATCCTGTGCAAAGGAAGGAGCTAGGCTGTGCGCCCTGGGCGTCATGA 


TCCTTCTGCGGGCCTCCGAAGTGCGGCAGCTGCTTCACAATAAGTTCGTGGTCATCCT 
GGGGGACTCTGTGCATAGGGCAGTATACAAGGACCTGGTGCTTCTGCTGCAGAAGGAC 
CG CCTGCTCACTCCCGGG CAG CTTAG AGCAAGGGGGGAG CTG AACTTCG AACAAGATG 
AGCTGGTGGACGGAGGCCAGCGGGGCCACATGCACAACGGCCTTAACTACCGTGAGGT 
CCGCGAGTTCCGCTCCGACCACCATCTGGTACGTTTTTACTTCCTCACCCGCGTGTAC 
TC CGATTACCTCCAGACCATCTTG AAAGAGCTGCAGTCGGG CG AGCACGCCCCCG ACC 
TGGTCATCATGAATTCCTGCCTCTGGGACATCTCCAGGTATGGTCCGAACTCCTGGAG 
AAGCTACCTGGAGAACCTGGAGAACCTGTTCCAGTGCCTGGGCCAGGTGCTGCCCGAG 
TCTTGCCTCCTGGTGTGGAACACGGCCATGCCTGTGGGCGAGGAAGTCACCGGGGGTT 
TTCTTCCGCCCAAGCTCCGGCGGCAGAAGGCCACCTTCCTGAAAAACGAAGTGGTCAA 
AGCCAACTTCCACAGCGCCACCGAGGCACGTAAACATAACTTCGATGTACTGGACTTG 
CATTTCCACTTCCGCCACGCGAGGGAGAACCTGCACTGGGACGGGGTGCACTGGAATG 
GACGTGTGCACCGCTGCCTCTCCCAGCTGCTGCTGGCCCACGTGGCCGACGCCTGGGG 
TGTGGAG CTGCCCCACCGCCACCCCGTGGGCGAGTGGATCAAG AAGAAAAAACCTGG C 
CCGAGAGTCGAAGGGCCGCCCCAGGCCAACAGAAATCACCCGGCCTTACCTCTGTCCC 
CACCCTTACCTTCCCCCACATACCGCCCCCTGCTTGGGTTCCCACCCCAGCGCTTGCC 
GCTGCTCCCGCTCCTGTCCCCACAGCCTCCTCCTCCCATTCTCCATCACCAGGGAATG 
CCCCGGTTCCCIACAGGGTCCCCCAGATGCCTGTTTTTCCTCAGACCATACTTTCCAGT 
CGGATCAATTCTATTGCCATTC AG ATGTCCC CT CATCAGCCCATGCAGGTTTCTTCGT 
CGAAGACAATTTTATGGTTGGTCCTCAGCTGCCTATGCCCTTCTTCCCCACACCCCGT 
TATCAGCGGCCTGCCCCAGTGGTACATAGGGGTTTTX3GCAGGTATCGTCCCCGTGGCC 
C CTATACG CCCTGGGG ACAG CGGC CTCGACCTTCAAAGAG AAGGGCCCC AGC CAATCC 
TGAGCCAAGGCCTCAATAGACGGACCTAGGCCTTATTTCCTCTTTATGAACATGGATT 


GGACAGATCTGACACTTCCrTTCCATTGCTTGGCCTGAACAGACTGACCTTGTTAACT 


TAAGCCTGGAGTCCATGCCTCGTCTTCCTTTTGTT 




ORF Start: ATG at 171 


ORF Stop: TAG at 1467 




SEQ ID NO: 280 


432 aa MW at 49726.6kD 


NOV97a, 

CG59559-01 Protein Sequence 


MILLRASEVRQLLHNKFVVILGDSVHRAVYK0LVLLLQKDRLLTPGQLRARGELNFEQ 
DELVDGGQRGHMHNGLNYREVREFRSDHHLVRFYFLTRVYSDYLQTILKELQSGEHAP 
DLVIMNSCLWDISRYGPNSWRSYLENLENLFQCLGQVLPESCLLVWNTAMPVGEEVTG 
GFLPPKLRRQKATFLKNEWKANFHSATEARKHNFDVLDLHFHFRHARENLHWDGVHW 
NGRVHRCLS QLLLAHVADAWG VE L PHRH P VGEW I KKKKPG PR VEG P PQANRNH P ALPL 
SPPLPSPTYRPLLGFPPQRLPLLPLLSPQPPPPILHHQGMPRFPQGPPDACFSSDHTF 
QSDQFYCHSDVPSSAHAGFFVEDNFMVGPQLPMPFFPTPRYQRPAPWHRGFGRYRPR 
GPYTPWGQRPRPSKRRAPANPEPRPQ 



Further analysis of the NOV97a protein yielded the following properties shown in 
Table 97B. 



Table 97B. Protein Sequence Properties NOV97a 
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PSort 
analysis: 


0.5937 probability located in mitochondrial matrix space; 0.5103 probability 
located in microbody (peroxisome); 0.4900 probability located in nucleus; 
0.3252 probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV97a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 97C. 



Table 97C. Geneseq Results for NOV97a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV97a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG74241 


Human colon cancer antigen protein 
SEQ ID NO:5005 - Homo sapiens, 
281 aa. [WO2001 22920- A2, 05-APR- 
2001] 


34..294 
1..266 j 


162/268 (60%) 
191/268(70%) 


le-82 


AAE03639 


Human extracellular matrix and cell 
adhesion molecule-3 (XMAD-3) - 
Homo sapiens, 386 aa. 
[WO200142285-A2, 14-JUN-2001] 


1..421 
1..366 


197/435 (45%) 
231/435(52%) 


2e-82 



In a BLAST search of public sequence databases, the NOV97a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 97D. 
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Table 97D. Public BLASTP Results for NOV97a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV97a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96HM7 ■ 


SIMILAR TO HYPOTHETICAL 
PROTEIN FLJ22376 - Homo sapiens 
(Human), 432 aa. 


1..432 
1..432 


432/432(100%) i 
432/432(100%) j 


0.0 


Q96B20 \ 


HYPOTHETICAL 3 1 .4 KDA 
PROTEIN - Homo sapiens (Human), 
279 aa. 


121..310 
1..190 


190/190(100%) 
190/190(100%) 


e-116 


Q9H1Q7 j 


BA12M19.1 .3 (NOVEL PROTEIN) 
(CDNAFU31791 FIS, CLONE 
NT2RI2008749, WEAKLY SIMILAR 
TO SPLICEOSOME ASSOCIATED 
PROTEIN 49) - Homo sapiens 
(Human), 454 aa. 


1..421 
18..434 


234/437(53%) 
273/437(61%) 


e-111 


Q9H1Q6 


BA12M19.1.1 (NOVEL PROTEIN) - 
Homo sapiens (Human), 403 aa. 


1..421 
18..383 


197/435(45%) 
231/435(52%) ! 


7e-82 


Q9H6D1 


CDNA: FLJ22376 FIS, CLONE 
HRC07327 - Homo sapiens (Human), 
403 aa. 


1..421 
18..383 


196/435(45%) \ 
231/435(53%) j 


le-81 



PFam analysis predicts that the NOV97a protein contains the domains shown in the 
Table 97E. 



Table 97E, Domain Analysis of NOV97a 



Pfam Domain 



NOV97a Match Region 



Identities/ 
Similarities 
for the Matched Region 



Expect Value 



No Significant Matches Found 



Example 98. 

The NOV98 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 98 A. 
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Table 98A. NOV98 Sequence Analysis 




SEQBDNO: 281 


981 bp 


NOV98a 5 

CG59669-01 DNA Sequence 


GCGCCGGGTCCCAGAATCTAGTCCTACGCCACGGTTTTGACCACGCGTGACCCGCTGC 


CCAGCCGGCCCGGCCATCAGGTGGTCCGTGTGTCCCTCTGACATGTCGTCCTGCAGCC 


GCGTGGCCCTGGTAACTGGGGCTAACAAAGGCATCGGCTTTGCGATCACGCGTGACCT 
GTGTCGGAAATTCTCCGGGGACGTGGTGCTCACGGCGCGGGACGAGGCGCGGGGCCGC 
GCGGCGGTGCAGCAGCTGCAGGCGGAGGGCCTGAGCCCACGCTTCCACCAGCTGGACA 
TCGACGACCCGCAGAGCATCCGTGCGCTGCGCGACTTTCTGCGCAAGGAGTACGGGGG 
ACTTAACGTGCTGGTCAACAACGCGGG PATPfifTTTTARA Af!T A CTfl AT rTPL rppa C* 
TTTCA(^TTCTAAGAGAAGCTGCAATGAAAACTAACTTTTTTC 

GCACAGAGCTACTCCCTCTAATAAAAACCCAAGGTAGAGTGGTGAATATATCAAGCCT 
AATAAGTCTAGAGGCCCTGAAAAACTGCAGCCTGGAGCTACAGCAGAAGTTTCGAAGT 
GAGACCATCACAGAGGAGGAGCTGGTGGGGCTCATGAACAAGTTTGTGGAGGATACAA 
AGAAAGGAGTCCATGCAAAAGAAGGCTGGCCTAATAGTGCATACGGGGTGTCTAAGAT 
TGGAGTGACAGTCCTGTCCAGAATCCTTGCCAGGAAACTCAATGAGCAGAGGAGAGGG 
GACAAGATCCTTCTGAATGCCTGCTGCCCTGGCTGGGTCAGAACCGAC^TGGCAGGAC 
CACAAGCCACCAAAAGCCCAGAAGAAGGAGCAGAGACCCCTGTGTACTTGGCCCTTTT 
GCCTCCAGATGCAGAGGGACCTCATGGGCAGTTTGTTCAAGATAAAAAAGTGGAACAA 
TGGTGAACTCAGCTCTTTGTACAGCTCCCATCTGTAGCCTGTCCTAAAGGGGA 




ORF Start: ATG at 101 


ORF Stop: TGA at 932 




SEQ LD NO: 282 


277 aa 


MW at 30547.7kD 


NOV98a, 

CG59669-01 Protein Sequence 


MSSCSRVALOTGANKGIGFAITRDLCRKFSGDWLTARDEARGRAAVQQLQAEGLSPR 
FHQLDIDDPQSIRALRDFLRKEYGGLNVLVNNAGIAFRSTDLTHFHILREAAMKTNFF 
GTQAVCTELLPLIKTQGRWNISSLISLEALKNCSLELQQKFRSETITEEELVGLMNK 
FVEDTKKGVHAKEGWPNSAYGVSKIGVTVLSRIIARKLNEQRRGDKILLNACCPGWVR 
TDMAGPQATKSPEEGAETPVYLALLPPDAEGPHGQFVQDKKVEQW 



Further analysis of the NOV98a protein yielded the following properties shown in 
Table 98B. 



Table 98B. Protein Sequence Properties NOV98a 


PSort 
analysis: 


0.4766 probability located in mitochondrial matrix space; 0.4500 probability 
located in cytoplasm; 0.1822 probability located in mitochondrial inner 
membrane; 0.1822 probability located in mitochondrial intermembrane space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV98a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 98C. 



Table 98C. Geneseq Results for NOV98a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV98a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAW51011 : 


Human liver carbonyl reductase - 
Homo sapiens, 277 aa. [US5756299- 
A,26-MAY-1998] 


1..277 
1..277 


236/277 (85%) 
252/277 (90%) 


e-134 
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AAU33100 


Novel human secreted protein #3591 - 
Homo sapiens, 175 aa. 
! [WO200179449-A2, 25-OCT-2001] 


142..277 
39.. 174 


119/136 (87%) 
128/136(93%) 


2e-66 


AAM73641 


Human bone marrow expressed probe 
encoded protein SEQ ID NO: 33947 - 
Homo sapiens, 123 aa. 
[WO200157276-A2, 09-AUG-2001] 


1..97 
1..97 


86/97 (88%) 
92/97 (94%) 


7e-43 


AAM60948 


Human brain expressed single exon 
probe encoded protein SEQ ID NO: 
33053 - Homo sapiens, 123 aa. 
[WO2001 57275- A2, 09-AUG-2001] 


1..97 
1..97 


86/97 (88%) 
92/97 (94%) 


7e-43 


AAM33832 


Peptide #7869 encoded by probe for 
measuring placental gene expression - 
Homo sapiens, 123 aa. 
[WO200157272-A2, 09-AUG-2001] 


1..97 
1..97 


86/97 (88%) 
92/97(94%) 


7e-43 


In a BLAST search of public sequence databases, the NOV98a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 98D. 


Table 98D. Public BLASTP Results for NOV98a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV98a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q924V2 


CARBONYL REDUCTASE 2 - 
Cricetulus griseus (Chinese hamster), 
277 aa. 


1..277 
1..277 


243/277 (87%) 
260/277 (93%) 


e-139 


Q91X28 


SIMILAR TO CARBONYL 
REDUCTASE 1 - Mus musculus 
(Mouse), 277 aa. 


1..277 
1..277 


244/277 (88%) 
256/277 (92%) 


e-139 


Q924V3 


CARBONYL REDUCTASE 1 - 
Cricetulus griseus (Chinese hamster), 
277 aa. 


1..277 
1..277 


241/277 (87%) 
256/277 (92%) 


e-137 


P48758 


Carbonyl reductase [NADPH] 1 (EC 
1 . 1 . 1 . 1 84) (NADPH-dependent 
carbonyl reductase 1) - Mus musculus 
(Mouse), 276 aa. 


2.211 
I. .216 


240/276(86%) 
253/276 (90%) 


e-136 


JC5284 


carbonyl reductase (NADPH) (EC 
1.1.1.1 84), inducible - rat, 277 aa. 


1.211 
1.211 


236/277 (85%) 
249/277 (89%) 


e-134 



PFam analysis predicts that the NOV98a protein contains the domains shown in the 
Table 98E. 
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Table 98E. Domain Analysis of NOV98a 


Pfam Domain 


NOV98a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


adh_short: domain 1 of 1 


4..274 


67/286 (23%) 
185/286 (65%) 


1.6e-38 



Example 99. 

The NOV99 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 99A. 



Table 99A. NOV99 Sequence Analysis 




SEQIDNO:283 1 1001 bp 


NOV99a, 

CG58624-01 DNA Sequence 


CTTGGTATAAGTAAGTGCTCGTCAATGTTGGCTACTCTCAATQTCAGAGCCGCAGCCG 


CGGGGCGC AG AGCGCGATCTCTACCGGGACACGTGGGTGCGATACCTGGG CT ATG CCA 
ATCAGGTGGGCGAGGCTTTCCGCTCTCTTGTGCCAGCGGCGGTGGTGTGGCTGAGCTA 
TGG CGTGG C CAGCTC CTACG TG CTGG CGG ATG C CATTG AC AAAGG CAAG AAGG CTGGA 
GAGGTGCCCAGCCCTGAAGCAGGCCGCAGCGCCAGGGTGACTGTGGCTGTGGTGGACA 
CCTTTGTATGGCAGGCTCTAGCCTCTGTGGCCATTCCGGGCTTCACCATCAACCGCGT 
GTGTGCTGCCTCTCTCTATGTCCTGGGCACTGCCACCCGCTGGCCCCTGGCTGTCCGC 
AAGTGGACCACCACCGCGCTT<^CTGTTGACCATCCCCATCATTATCCACCCCATTG 
ACAGGG ATCATCCACT CTC CAGTG ATG AG AGTGG AT CAT C CAGT CT CCAG CAC G AAGG 
G C CAGGGGTCCCACAGG TG AG TGG AG CC CCAG C AG C CC C CT CAGCT CTG CG TG CCCAT 
GTACTGGTCTTCTCCCTGGCTCTATACTCAGTGTTCAAGGGGTTGGACGGGGCTTGGG 
CCGCGGAGCTGCGCCTGGCTTTGCTCCTCC^ 

CCTGCAACTGCTGCAGAGCCACGTAGGGTTACAGGTGGTGGCTGGCTGTGGGATCCAC 
TTCTTGTGCATGACACTTCTAGGCATCCGGCTGGGTGCGGCTCTGGCACAGTCAGCAG 
GGCCTCTGCACCAGCTGGCCCAGTCTGTGCTAGAGGGCATGGTGGCTKSGCACCTTCCT 
CTATACCACCTTTCTCGAAATCTTTCCACAGGAGCTGGCGACTTCTGAGCAAAGGATC 
CTCAAGGT(ZATTCTGCTCCTAGAAGGGTGTGCCCTGCTCACTGGCCTGCTCTTCATCC 
ATATCTAGGGGGCTT 




ORF Start: ATG at 41 


ORF Stop: TAG at 992 




SEQ ID NO: 284 


317 aa 


MW at 33737.8kD 


NOV99a, 

CG58624-01 Protein Sequence 


MSE PQPRGAERDLYRDTWVRYLGYANE VGEAPRS LVPAAWWLS YG VASS YVLADAI D 
KGKKAGEVPSPEAGRSARVTVAVVDTFVWQAXJVSVAIPGFTINRVCAASLYVLGTATR 
WPLAVRKWTTTALGLLTIPIIIHPIDRDHPLSSDESGSSSLQHEGPGVPQVSGAPAAP 
SALRAHVLVFSLALYSVFKGLDGAWAAELRLALLLHKGTVAVSLSLQLLQSHVGLQW 
AGCGIHFLCMTLLGI RLGAALAQSAGPLHQLAQS VLEGMVAGTFLYTTFLEI FPQELA 
TS EQRI LKVI LLLEGC ALLTG LLF I HI 



Further analysis of the NOV99a protein yielded the following properties shown in 
Table 99B. 



Table 99B. Protein Sequence Properties NOV99a 


PSort 
analysis: 


0.6000 probability located in plasma membrane; 0.4000 probability located in 
Golgi body; 0.3000 probability located in endoplasmic reticulum (membrane); 
0.1000 probability located in mitochondrial inner membrane 


SignalP 
analysis: 


Likely cleavage site between residues 55 and 56 
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A search of the NOV99a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 99C. 



Table 99C. Geneseq Results for NOV99a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV99a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Repinn 


Expect 
Value 


AAM93835 


Human polypeptide, SEQ ID NO: 
3905 - Homo sapiens, 324 aa. 
[EP1130094-A2, 05-SEP-2001] 


140..317 
141..324 


134/184 (72%) 
145/184 (77%) 


3e-63 


AAY52394 


Human transmembrane protein 
HP 1 0528 - Homo sapiens, 324 aa. 
[W09955862-A2, 04-NOV-1999] 


140..317 ; 
141..324 ; 


134/184(72%) 
145/184(77%) 


3e-63 


AAY84895 


A human proliferation and apoptosis 
related protein - Homo sapiens, 324 
aa; [WO200023589-A2, 27-APR- 
2000] 


140..317 ! 
141. .324 ! 


134/184 (72%) 
145/184 (77%) 


3e-63 


AAB43291 


Human ORFX ORF3055 polypeptide 
sequence SEQ ID NO:61 10 - Homo 
sapiens, 323 aa. [WO200058473-A2, 
05-OCT-2000] 


140..317 j 
140..323 j 


134/184 (72%) 
145/184(77%) 


3e-63 


AAM93650 


Human polypeptide, SEQ ID NO: 
3514 - Homo sapiens, 324 aa. 
[EP1130094-A2, 05-SEP-2001] 


140..317 
141..324 


133/184(72%) 
144/184 (77%) 


2e-62 



In a BLAST search of public sequence databases, the NOV99a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 99D. 



Table 99D. Public BLASTP Results for NOV99a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV99a ; 
Residues/ j 

Match 
Residues ; 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9UDX5 


WUGSC:H_DJ0539M06.2 PROTEIN - 
Homo sapiens (Human), 166 aa. 


1..152 
1..152 ; 


145/152 (95%) ; 
145/152(95%) 


6e-78 


Q9CRB8 


2610507 A21RHC PROTEIN 
(1700020C1 1RUC PROTEIN) - Mus 
musculus (Mouse), 166 aa. 


1..168 : 
1..164 j 


133/168(79%) : 
143/168(84%) : 


8e-69 
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Q9CZX4 j 


2610507A21RIR PROTEIN - Mus 
muscuius (Mouse), 166 aa. 


1..143 I 
1..143 | 


125/143 (87%) 
133/143(92%) \ 


2e-68 


Q9NY26 j 


IRT1 PROTEIN (SIMILAR TO 
ZINC/IRON REGULATED -\ 
TRANSPORTER-LIKE) 
(HYPOTHETICAL 34.2 KDA 
PROTEIN) (UNKNOWN) (PROTEIN \ 
FOR MGC:14180)- Homo sapiens \ 
(Human), 324 aa. 


140..317 \ 
141..324 \ 


134/184(72%) 
145/184(77%) 1 


le-62 


Q9Y380 j 


CGI-7 1 PROTEIN - Homo sapiens 
(Human), 324 aa. 


140..317 j 
141..324 | 


134/184(72%) \ 
145/184(77%) ^ 


le-62 



PFam analysis predicts that the NOV99a protein contains the domains shown in the 
Table 99E. 



Table 99E. Domain Analysis of NOV99a 


Pfam Domain 


NOV99a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


Syndecan: domain 1 of 1 \ 


235..255 


9/21 (43%) 
16/21 (76%) 


6.9 


Zip: domain 1 of 1 


174..313 


52/178 (29%) 
108/178 (61%) 


2.3e-15 



Example 100. 



The NOV100 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 100A. 



Table 100 A. NOV100 Sequence Analysis 




SEQ ID NO: 285 


987 bp 


NOVlOOa, 

CG59679-01 DNA Sequence 


AGACGCTCACACAGACAACCTCAAGTCC AGCAACATC TTAGT AG CCCAAAATCG ACTG 


CTTTAGTTCTTCTGGTGGGTGCCTCTCACTGTCCACTCGGCTATGCCATCCTGCAGTC 


GCATTGCACTGGTGACTGGAGCTAATAAGGGCATTGGCTTTGCGATCACTCGTGACCT 
GTGTCAGCAATTCTCAGGGGATGTGGTGCTCACTGCACGGGACGAGGCACGGGGCCTT 
G CGGCAGTGCAGAAGCTGCAGG CTGAGGGCCTG ATTCCTCGCTTCCACCAGCTGG ACA 
TCAATGACCCTCAGAGCATCCATGCACTTCGCAACTTTCTGCTCAAGGAGTACGGAGG 
CCTGGATGTGCTGGTCAACAACGCGGGCATTGGCGTGCTTTTCAAAGTGGATGACCCA 
ACACCCTTCGACATTCAAGCTGAGGTGACACTX3AAGACGAACTTTTTTGCCACTAGAA 
ATGTCTGCACTGAGTTACTGCCTATAATGAAACCACATGGTAGAGTGGTGAACATCAG 
CAGTCTGCAGGGGTT AAAAG CCCTTGAG AACTG CAGGG AAG ATCTTCAGGAAAAGTTC 
CGATGTG ACACACTT ACCG AGGTGG ACCTGGT CGACCT CATG AAAAAGTTTGTGG AGG 
ATACAAAAAATGAAGTCCATGAGAGGGAAGGTTGGCCAGACTCGGCTTACGGGGTGTC 
GAAGCKK^GTGACAGTCCTTACGAGGATCCTGGCCCGGCAGCTGGATGAAAAGAGG 
AAAGCGGACAGGATTCTGCTCAATGCCTGCTGCCCGGGATGGGTGAAGACCGACATGG 
CGAGGGACCAGGGCTCCCGGACCGTGGAAGAGGGGGCCGAAACCCCCGTTTACTTGGC 
TCTCCTGCCTCCAGATGCCACTG AAC CTC ACGGCCAG CT AGTCCGTG ACAAAGTTGTG 
CAAACTTGGTGAACGTCTGCTCTGGGGCTTAATTGTTTGATAAACGTTAGCGGGAGAG 
A 




ORF Start: ATGat 101 


ORF Stop: TGA at 938 
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SEQ ID NO: 286 


279 aa 


MWat31007.2kD 


NOVlOOa, 

CG59679-01 Protein Sequence 


MPSCSRIALVTGANKGIGFAITRDLCQQFSGDVVLTARDEARGLAAVQKIjQAEGLIPR 
FHQLDINDPQSIHALRNFLLKEYGGLDVLVNNAGIGVLFKVDDPTPFDIQAEVTLKTN 
FFATRNVCTELLPIMKPHGRWNISSLQGLKALENCREDLQBKFRCDTLTEVDLVDLM 
KKFVEDTKNEVHEREGWPDS AYGVS KLGVTVLTRI LARQLDEKRKADR I LLNACCPGW 
VKTDMARDQGSRTVEEGAETPVYIALLPPDATEPHGQLVRDKWQTW 



Further analysis of the NOVlOOa protein yielded the following properties shown in 
Table 100B. 



Table 100B. Protein Sequence Properties NOVlOOa 


PSort 
analysis: 


0.3600 probability located in mitochondrial matrix space; 0.3000 probability 
located in microbody (peroxisome); 0. 1 808 probability located in lysosome 
(lumen); 0.0000 probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOVlOOa protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 100C. 



Table 100C. Geneseq Results for NOVlOOa 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOVlOOa 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAW51011 


Human liver carbonyl reductase - 
Homo sapiens, 277 aa. [US5756299- 
A,26-MAY-1998] 


1..279 
1..277 


198/279 (70%) 
233/279 (82%) 


e-112 


AAU33100 


Novel human secreted protein #3591 
- Homo sapiens, 175 aa. 
[WO2001 79449- A2, 25-OCT-2001] 


145..279 ! 
40..174 | 


88/135 (65%) 
110/135(81%) 


2e-48 


AAG46601 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 58644 - Arabidopsis 
thaliana, 302 aa. [EP1033405-A2, 06- 
SEP-2000] 


3..259 
20..283 j 


106/268 (39%) 
157/268 (58%) 


6e-43 


AAG46600 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 58643 - Arabidopsis 
thaliana, 316 aa. [EP 103 3405- A2, 06- 
SEP-2000] 


3..259 ; 
34..297 


106/268 (39%) 
157/268 (58%) 


6e-43 


AAG46599 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 58642 - Arabidopsis 


3..259 i 
45..308 


106/268 (39%) 
157/268 (58%) 


6e-43 
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SEP-2000] 









In a BLAST search of public sequence databases, the NOVlOOa protein was found to 
have homology to the proteins shown in the BLASTP data in Table 100D. 



Table 100D. Public BLASTP Results for NOVlOOa 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOVlOOa 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
tne Matcned 
Portion 


Expect 
Value 


Q9JJN7 


CARBONYL REDUCTASE (EC 

111 1 QA\ (C* ADD /^XrVT 

REDUCTASE 3) - Cricetulus griseus 
(Chinese hamster), 277 aa. 


1..279 

/ / 


246/279(88%) 
Zoz/i /y (yjyoj 


e-140 




Homo sapiens (Human), 277 aa. 


1 970 
1..277 


997/970^81°/^ 
zz / / z, iy yo i /oj 

246/279(87%) 




075828 


Carbonyl reductase [NADPH] 3 (EC 
1 . 1 . 1 . 1 84) (NADPH-dependent 
carbonyl reductase 3) - Homo sapiens 
(Human), 276 aa. 


3..279 
2..276 


226/277(81%) 
245/277(87%) 


e-126 


Q924V2 


CARBONYL REDUCTASE 2 - 
Cricetulus griseus (Chinese hamster), 
277 aa. 


1..279 
1..277 


206/279 (73%) 
244/279 (86%) 


e-119 


Q91X28 


SIMILAR TO CARBONYL 
REDUCTASE 1 - Mus musculus 
(Mouse), 277 aa. 


1..279 
1..277 


204/279 (73%) 
240/279 (85%) 


e-116 



PFam analysis predicts that the NOVlOOa protein contains the domains shown in the 
Table 100E. 



Table 100E. Domain Analysis of NOVlOOa 


Pfam Domain 


NOVlOOa Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
c Value 


adh_short: domain 1 of 
1 


4..277 


77/316(24%) 
186/316(59%) 


5.2e-31 



Example 101. 



The NOV101 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 101 A. 
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Table 101 A. NOV101 Sequence Analysis 




SEQIDNO:287 


1011 bp 


NOVlOla, 

CG59644-01 DNA Sequence 


CTCCTCGGGGGGGCGGCGGCGGCGATGTTCTCGGTCCTCTCGTACGGGCGGCTGGTGG 
CCCGCGCCGTGCTCGGCGGCCTCTCGCAGACCGACCCCAGGGCCGGCGGCGGCGGCGG 
CGGCGCCGTGCTCGGCGGCCTCTCGCAGACCGACCCCAGGGCCGGCGGCGGCGGCGGC 
GGCGACTACGGACTGGTGACGGCCGGCTGCGGCTTCGGGAAGGACTTCCGTAAGGGCC 
TCCTCAAGAAGGGCGCGTGCTACXK3GGACGACGCGTGCTTCGTGGCCCGGCACCGTTC 
CGCGG ACGTG CTCGGTGTTGCAGATGGTGTAGG AGG CTGGAG AG ACTATGGAGTTGAT 
CCATCTCAATTCTCAGGGACTTTAATGCGGACGTGTGAACGTTTAGTAAAAGAAGGAC 
GGTTCGTACCTAGTAATCCCATTGGAATT CTCACCACAAG CTACTGTG AGTTG CTGCA 
AAATAAAGTCCCTTTGCTCGGTAGCAGCACCGCCTGCATTGTGGTGCTGGACAGAACC 
AGCCACCGCTTACACACAG CAAACCTGGG CG ATTCAGGCTTC CTGGTTGTCAGGGGTG 
GTGAAGTCGTGCACCGATCAGATGAGCAGCAGCATTACTTCAACACTCCATTCCAGCT 
CTCAATCGCTCCCCCTGAAGCCGAGGGAGTCGTCTTGAGCGACAGTCCGGATGCTGCT 
GATAGCACGTCTTTCGATGTCC AG CTAGG AG ACATTATC CTG ACGGCAAC AGATGGAC 
TCTTTG AC AAC ATGC CTG ATT AT ATG ATT CTT CAG G AGCT AAAAAAGTT AAAG AATT C 
AAATTATGAGAGTATACAACAG ACTGCCAGAAG CATTGCTGAGCAAGCTCATG AG CTG 
GCCTATGACCCAAATTATATGTCACCTTTrGCACAGTTTGCATGTGACAATGGATTGA 
ATCTCAGAGGTGGTGGAAAGCCAGATGACATCACCGTCCTTCTTTCAATAGTGGCTGA 
GTATACAGACTAGCTGAGGTGTCAA 




ORF Start: ATG at 25 


ORF Stop: TAG at 997 




SEQ ID NO: 288 


324 aa 


MWat3431LlkD 


NOVlOla, 

CG59644-01 Protein Sequence 


MFSVLSYGRLVARAVLGGLSQTDPRAGGGGGGAVLGGLSQTDPRAGGGGGGDYGLVTA 
G CG FGKD F RKG LLKKGAC YGDD AC FVARHRS AD VLGVADG VGGWRD YG VD PS Q F SGTL 
MRTCERLVKEGRFVPSNPIGILTTSYCELLQNKVPLLGSSTACIVVLDRTSHRIiHTAN 
LGDSG FLWRGGEWHRSDEQQHYFNTPFQLS I AP PE AEG WLSDS PDAADSTS FDVQ 
LGDI I LTATDGLFDNMPDYMI LQELKKLKNSNYES IQQTARS I AEQAHELAYDPNYMS 
PFAQFACDNGLNVRGGGKPDDITVLLSIVAEYTD 



Further analysis of the NOVlOla protein yielded the following properties shown in 
Table 10 IB. 



Table 101B. Protein Sequence Properties NOVlOla 


PSort 
analysis: 


0.5708 probability located in mitochondrial matrix space; 0.4996 probability 
located in mitochondrial intermembrane space; 0.2852 probability located in 
mitochondrial inner membrane; 0.2852 probability located in mitochondrial outer 
membrane 


SignalP 
analysis: 


Likely cleavage site between residues 23 and 24 



A search of the NOVlOla protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 101C. 
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Table 101 C. Geneseq Results for NOVlOla 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOVlOla 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB85357 


Human phosphatase (PP) (clone ID 
3402521 CD 1) - Homo sapiens, 304 
aa. [WO200153469-A2, 26-JUL- 
2001] 


1..324 
1..304 


304/324(93%) j 
304/324(93%) j 


e-173 


AAU32112 1 


Novel human secreted protein #2603 - 
Homo sapiens, 304 aa. 
[WO200179449-A2, 25-OCT-2001] 


25..324 
6.. 304 


272/300(90%) 1 
274/300(90%) j 


e-156 


AAG52267 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 66421 - Arabidopsis 
thaliana, 348 aa. [EP1033405-A2, 06- 
SEP-2000] 


71.. 320 
99..340 


101/261 (38%) j 
133/261 (50%) ! 


4e-33 


AAG52266 j 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 66420 - Arabidopsis 
thaliana, 374 aa. [EP1 033405- A2, 06- 
SEP-2000] 


71. .320 
125..366 


101/261 (38%) ! 
133/261 (50%) | 


4e-33 


AAG52265 \ 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 66419 - Arabidopsis 
thaliana, 467 aa. [EP1 033405- A2, 06- 
SEP-2000] 


71..320 
218..459 


101/261 (38%) j 
133/261 (50%) j 


4e-33 



In a BLAST search of public sequence databases, the NOVlOla protein was found to 
have homology to the proteins shown in the BLASTP data in Table 101D. 



Table 101D. Public BLASTP Results for NOVlOla 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOVlOla 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9W0E2 


CGI 2091 PROTEIN - Drosophila 
melanogaster (Fruit fly), 321 aa. 


1..320 
1..320 


163/322 (50%) 
218/322 (67%) 


le-83 


Q9W3R1 


CG15035 PROTEIN - Drosophila 
melanogaster (Fruit fly), 374 aa. 


55..319 
109..373 


127/266 (47%) 
178/266 (66%) 


le-64 


018183 


W09D 10.4 PROTEIN - 
Caenorhabditis elegans, 330 aa. 


4..320 
7..330 


136/331 (41%) 
198/331 (59%) 


2e-60 


Q9VAH4 








2e-56 
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melanogaster (Fruit fly), 314 aa. 


26..309 


168/285(58%) ! 




Q9SUK9 


HYPOTHETICAL 36.2 KDA 
PROTEIN - Arabidopsis thaliana 
(Mouse-ear cress), 335 aa. 


71..320 
86..327 


101/261 (38%) 
133/261 (50%) 


le-32 



PFam analysis predicts that the NOV 101 a protein contains the domains shown in the 
Table 101E. 



Table 101E. Domain Analysis of NOVlOla 


Pfam Domain 


NOVlOla Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


PP2C: domain 1 of 1 


147..191 


13/48 (27%) 
36/48 (75%) 


0.26 



Example 102. 

The NOV 102 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 102 A. 



Table 102 A. NOV102 Sequence Analysis 




SEQ ID NO: 289 


523 bp 


NOV102a, 

CG59662-01 DNA Sequence 


AGTCCCAGTACTATCAGCCATGGTCAACCACACCATGTTCTTCGACGTTGCTGTCGAC 
AGTG AGCCCTTGG ACCACGT CT CCTTTGAGCTGTTTG CAG AAAAGTTTCCAAAG ACAG 
CAG AAAACGT^CGTCCTCTG AGCACTG AAG AG AAAGGATTTGGTTATAAGGGT C CCTG 
CTITCACAGAATTATACCAGCATTTATGTGTCAGGGTGGTGACTTCACGCACCATAAT 
GGCACTGGTGGCAAGTCCATCTACGGGGAGAAATTTGAAGATGAGAAATTTATCCTAA 
AGCGTACAGGTCCTGGCATCTTGTCCATGGCAAATTCTGGACCCAACACAAACTGTTC 
CGTTTTTTTCATCTGCACTGCCAAGACGGGGTCGTTC 

GGCAAGGTGAAAGAAGGCATGAATATTTTGGAGGCCATAGAGCAATTTGGGTCCAGGA 
ATGGCAAGACCAGCAAGAAGACCACCATTGCTGACTGTGGACAGCTCTGGTAAGTTTG 
A 




ORF Start: ATG at 20 


ORF Stop: TAA at 515 




SEQ ID NO: 290 


165 aa MW at 18237.7kD 


NOV 102a, 

CG59662-01 Protein Sequence 


MVNHTMFFDVAVDSEPLDHVSFELFAEKFPKTAENVRALSTEEKGFGYKGPCFHRIIP 
AFMCQGGDFTHHNGTGGKS I YGEKFEDE KFILKRTG PGI LSMANSGPNTNCS VFFICT 
AKTGWLDGKHWFGKVKEGMNILEAIEQFGSRNGKTSKKTTIADCGQLW 



Further analysis of the NOV 102a protein yielded the following properties shown in 
Table 102B. 



Table 102B, Protein Sequence Properties NOV102a 


PSort 
analysis: 


0.6400 probability located in microbody (peroxisome); 0.4500 probability 
located in cytoplasm; 0.1000 probability located in mitochondrial matrix space; 
0.1000 probability located in lysosome (lumen) 




No Known Signal Sequence Predicted 
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analysis: 



A search of the NOV 102a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 102C. 



Table 102C Geneseq Results for NOV102a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
ft, tiatej 


NOV102a 
Residues/ 
iviaxcn 
Residues 


Identities/ 
Similarities for 
we iviaicneu 
Region 


Expect 
Value 


AAU01195 


Human cyclophilin A protein - Homo 
^aniens 16^ aa rWO2001 ^2876-A2 
10-MAY-2001] 


1..164 
1 164 


141/164 (85%) 
148/164 (89%^ 


le-80 


AAW56028 


Calcineurin protein - Mammalia, 165 
aa. [WO9808956-A2, 05-MAR-1998] 


1..164 
1..164 


141/164(85%) 
148/164(89%) i 


le-80 


AAG65275 


Haematopoietic stem cell 
proliferation agent related human 
protein #2 - Homo sapiens, 164 aa. 
[JP2001163798-A, 19-JUN-2001] 


2..164 
1..163 


140/163 (85%) 
147/163 (89%) 


5e-80 


AAP90431 


Cyclophilin - Homo sapiens (human), 
164 aa. [EP326067-A, 02-AUG-1989] 


2..164 
1..163 


140/163 (85%) 
147/163 (89%) 


5e-80 


AAG03831 


Human secreted protein, SEQ ID NO: 
7912 - Homo sapiens, 165 aa. 
[EP1033401-A2, 06-SEP-2000] 


1..164 
1..164 ! 


140/164(85%) 
147/164 (89%) 


8e-80 



In a BLAST search of public sequence databases, the NOV 102a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 102D. 



Table 102D. Public BLASTP Results for NOV102a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV102a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for | 
the Matched 
Portion 


Expect 
Value 


CAC39529 


SEQUENCE 26 FROM PATENT 
WO01 32876 - Homo sapiens 
(Human), 165 aa. 


1..164 
1..164 


141/164(85%) ! 
148/164(89%) \ 


4e-80 


Q9BRU4 


PEPTIDYLPROLYL ISOMERASE A 


1..164 
1..164 


140/164 (85%) 
147/164(89%) 


2e-79 
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(Human), 165 aa. 








P05092 


Peptidyl-prolyl cis-trans isomerase A 
(EC 5.2.1.8) (PPIase) (Rotamase) 
(Cyclophilin A) (Cyclosporin A- 
binding protein) - Homo sapiens 
(Human),, 1 64 aa. 


2..164 
1..163 


140/163 (85%) 
147/163 (89%) j 


2e-79 


Q96IX3 


PEPTIDYLPROLYL ISOMERASE A 
(CYCLOPHILIN A) - Homo sapiens 
(Human), 165 aa. 


1..164 
1..164 


140/164(85%) ! 
147/164(89%) i 


5e-79 


P04374 


Peptidyl-prolyl cis-trans isomerase A 
(EC 5.2.1.8) (PPIase) (Rotamase) 
(Cyclophilin A) (Cyclosporin A- 
binding protein) - Bos taurus (Bovine), 
and, 163 aa. 


2..164 
1..163 


138/163 (84%) 
147/163 (89%) 


7e-79 



PFam analysis predicts that the NOV 102a protein contains the domains shown in the 
Table 102E. 



Table 102E. Domain Analysis of NOV102a 


Pfam Domain 


NOV102a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


pro isomerase: domain 1 of 
1 


5..165 


105/180 (58%) 
141/180 (78%) 


4.2e-91 



Example 103. 



The NOV 103 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 103 A. 



Table 103A. NOV103 Sequence Analysis 




SEQ ID NO: 291 8860 bp 


NOV103a, 

CG59773-01 DNA Sequence 


GG ATCCTTGAGGG CACTGGTGCGACTTTC AGGTG AGGTCTTAG CAG ATG AAAGCGGCT 


GGCTGTGGCCCGCGCCAGTAGTGCTTTCTGCTCCGCACTCGCCGTGAGCCAGGTGTGC 


AACCGGATTTGGGGCGAGGGTCGCGCTGGCTACCTCGCATGCGCAGAGCCGGAAGCCC 


GCTGACCGG ACTACAG CT CCCAG AAGAGC CTTGTGG AGG CCG CAG ACG CG AAG CCGCT 


GGCGCCATCTTGAAATCTGATCCTCCATCCCCGAGGCTTTGCGTCTGCGCGGCCGGCC 


GCTGCTGCTCCGGGAGCCCAGTCTGCTAAAAGGGGAGGACGTTGAGGACGCGGCGGCT 


GG CGGGAG AG ACAGCTGGGG AG AGAC ATGG CAGGGTCGGAG CGCGGCCTG CG CCTCTG 


TCACTCAGCATCCTCTTAGGCGTTTCCACGCCCGCCCCCTGCCCGAGGGGCGGGGCTG 


ACGGCTCTGGTACCCGGAGTCGGCGCGCGGGGCAGGGGCGCGCCCCTGCAGAGTGGGG 


ACCCCACTGGGCTGTGCCATGCTGACCGGAGACCACCGAGGCGGGAGACAGAGCGCGG 


CGAAGAGCCATTGAGTGGTCACCCAGTAGCCGCCGCCGCCGCCGCCTCGGGAAGCTTG 


CCACCCG CTAGG AGGG AAGATGAAGG AG ATTTG CAGG ATCTGTGCCCG AGAG CTGTGT 
GGAAACCAGCGGCGCTGGATCTTCCACACGGCGTCCAAGCTCAATCTCCAGGTTCTGC 
TTTCGCACGTCTTGGGCAAGGATGTCCCCCGCGATGGCAAAGCCGAGTTCGCTTGCAG 
CAAGTGTGCTTTCATGCTTGATCGAATCTATCGATTCGACACAGTTATTGCCCGGATT 
GAAGCG CTTTCTATTG AGCGCTTGCAAAAGCTGCTACTGG AG AAGG AT CGCCTCAAGT 
TCTG CATTGCCAGTATGT ATCGGAAG AAT AACG ATGACTCTGG CGCGG AG ATCAAGGC 
GGGGAATGGGACGGTTGACATGTCCGTCTTACCCGATGCGAGATACTCTGCACTGCTC 
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CAGGAGGACTTCGCCTATTCAGGGTTTdAGTGCTGGGTGGAGAATGAGGATCAGATCC 
AGGAGCCACACAGCTGCCATGGTTCAGAAGGCCCTGGAAACCGACCCAGGAGATGCCG 
TGGTTGTGCCGCTTTGCGGGTTGCTGATTCTGACTATGAAGCCATTTGTAAGGTACCT 
CGAAAGGTGGCCAGAAGTATCTCCTGCGGCCCTTCTAGCAGGTGGTCGACCAGCATTT 
GCACTGAAGAACCAGCGTTGTCTGAGGTTGGGCCACCCGACTTAGCAAGCACAAAGGT 
ACCCCCAGATGGAGAAAGCATGGAGGAAGAGACGCCTGGTTCCTCTGTGGAATCTTTG 
GATGCAAGCGTCCAGGCTAGCCCTCCACAACAGAAAGATGAGGAGACTGAGAGAAGTG 
CAAAGGAACTTGGAAAGTGTGACTGTTGTTCAGATGATCAGGCTCCGCAGCATGGGTG 
TAATCACAAGCTGGAATTAGCTCTTAGCATGATTAAAGGTCTTGATTATAAGCCCATC 
CAGAGCCCCCGAGGGAGCAGGCTTCCGATTCCAGTGAAATCCAGCCTACCTGGAGCCA 
AGCCTGGCCCTAGCATGACAGATGGAGTTAGTTCCGGTTTCCTTAACAGGTCTTTGAA 
ACCCCTTTACAAGACACCTGTGAGTTATCCCTTGGAGCTTTCAGACCTGCAGGAGCTG 
TGGGATGATCTCTGTGAAGATTATTTGCCGCTCCGGGTCCAGCCCATGACTGAAGAGT 
TGCTGAAACAAC AAAAG CTGAATTCACATGAG AC CACTAT AACTCAGCAGTCTGTATC 
TG ATTC CCACTTGG CAG AACT CCAGG AAAAAAT C CAG CAAACAG AGGC CA CC AACAAG 
ATTCTTCAAGAGAAACTTAATGAAATGAGCTATGAACTAAAGTGTGCTCAGGAGTCGT 
CTCAAAAGCAAGATGGTACAATTCAGAACCTCAAGGAAACTCTGAAAAGCAGGGAACG 
TGAGACTGAGGAGTTGTACCAGGTAATTGAAGGTCAAAATGACACAATGGCAAAGCTT 
CGAGAAATGCTGCACCAAAGCCAGCTTGGACAACTTCACAGCTCAGAGGGTACTTCTC 
CAGCTCAGCAACAGGTAGCTCTGCTTGATCTTCAGAGTGCTTTATTCTGCAGCCAACT 
TG AAAT ACAG AAGCTCCAGAGGGTGGTACGACAGAAAGAGCG CCAACTGGCTGATG CC 
AAACAATGT G TG CAATTTGT AG AGG CTG CAG C ACACG AG AGTG AACAGCAG AAAGAGG 
CTT CTTGG AAACAT AACC AGG AATTG CG AAAAGCCTTG CAG CAG CTAC AAG AAGAATT 
GCAGAATAAGAGCCAACAGCTTCGTGCCTGGGAGGCTGAAAAATACAATGAGATTCGA 
AC C CAG G AA C AAAA CATC CAG C A CCT AAAC C ATAGT CTG AGT CACAAGG AG CAGTTG C 
TTCAGGAATTTCGGGAGCTCCTACAGTATCGAGATAACTCAGACAAAACCCTTGAAGC 
AAATGAAATGTTGCTTGAGAAACTTCGCCAGCGAATACATGATAAAGCTGTTGCTCTG 
GAGCGGGCTATAGATGAAAAATTCTCTGCTCTAGAAGAGAAAGAAAAAGAACTGCGCC 
AGCTTCGTCTTGCTGTGAGAGAGCGAGATCATGACTTAGAGAGACTGCGCGATGTCCT 
CTCCTCCAATGAAGCTACTATGCAAAGTATGGAGAGTCTCCTGAGGGCCAAAGGCCTG 
GAAG TGG AACAG TT AT CTAC TACCTGT CAAAAC CTC CAGTGG CTG AAAG AAG AAAT GG 
AAAC CAAATTTAGCCGTTGG CAG AAGG AACAAG AG AGT ATCATT CAG CAG TT ACAG AC 
GTCTCTTCATGATAGGAACAAAGAAGTGGAGGATCTTAGTGCAACACTGCTCTGCAAA 
CTTGGACCAGGGCAGAGTGAGATAGCAGAGGAGCTGTGCCAGCGTCTACAGCGAAAGG 
AAAGGATGCTGCAGGACCTTCTAAGTGATCGAAATAAACAAGTGCTGGAACATGAAAT 
GGAGATTCAAGGCCTGCTTCAGTCTGTGAGCACCAGGGAGCAGGAAAGCCAAGCTGCT 
GCAGAGAAGTTGGTGCAAG CCTT AATGG AAAGAAATTC AGAATT ACAGGCCCTGCG CC 
AATATTTAGGAGGGAGAG ACTCCCTGATGTCC CAAGCACCCATCTCT AACCAACAAG C 
TG AAGTT AC C CC CACTG G C CGTCTTGG AAAA CAG ACTGATCAAGGTT C AATGCAG AT A 
CCTTCCAGAGATGATAGCACTTCATTGACTGCCAAAGAGGATGTCAGCATACCCAGAT 
CCACATTAGGAGACTTGGACACAGTTGCAGGGCTGGAAAAAGAACTGAGTAATGCCAA 
AG AGG AACTTG AACTCATGG CTAAAAAAG AAAG AG AAAG TCAGATGGAACTTT CTGCT 
CTACAGTCCATGATGGCTGTGCAGGAAGAAGAGCTGCAGGTGCAGGCTGCTGATATGG 
AGTCTCTGACCAGGAACATACAGATTAAAGAAGATCTCATAAAGGACCTGCAAATGCA 
ACTGGTTG ATCCTG AAGACATACCAG CTATGGAACG CCTG ACCCAGG AAGTCTTACTT 
CTT CGG G AAAAAG TTG CTT CAG T AG AAT C C CAGGGT CAAG AAATTT CAGG AAAC CG AA 
G ACAACAGTTGCTG CTGATG CTAG AAGG ACT AGTAGATG AACGGAGTCGG CTCAATGA 
GGCCTTACAAGCAGAGAGACAGCTCTATAGCAGTCTGGTGAAGTTCCATGCCCATCCA 
GAGAGCTCTGAGAGAGACCGAACTCTGCAGGTGGAACTGGAAGGGGCTCAGGTGTTAC 
G CAGTCGG CTAG AAG AAG TT CTTG GAAG AAG CTTGG AG CG CTT AAACAGG CTGGAG AC 
CCTGGCCG CCATTGGAGGTG CAGCTGCAGGGGATG ACACCG AAG ATACAAGC ACTG AG 
TTCACTG ACAGT ATTGAGG AGG AGGCTG C ACAC CATAGTCACCAGCAACTTGTCAAGG 
TGGCTTTGGAGAAAAGTCTGGCAACTGTGGAGACCCAG AAC CCATCTTTTTCCCCT CC 
TTCTCCGATGGGAGGGGACAGTAACAGGTGTCTTCAGGAAGAAATGCTCCACCTGAGG 
GCTG AGTT CCAC CAG CACTT AG AAG AG AAG AGG AAAG CTG AGG AGG AACTG AAG G AG C 
TAAAGGCTCAAATTGAGGAAGCAGGATTCTCCTCAGTGTCCCACATCAGGAACACCAT 
GCTG AGCCTTTGCCTTGAG AATGCGG AG CTGAAAG AG C AGATGGG AGAAG CAATGT CT 
GATGGATGGGAGATCGAGGAAGACAAGGAGAAGGGCGAGGTGATGGTTGAGACTGTGG 
TAACCAAAGAGGGTCTGAGTGAGAGTAGCCTTCAGGCTGAGTTCAGAAAGCTCCAGGG 
AAAACTGAAGAATGCCCACAATATCATCAACCTCCTCAAAGAACAACTTGTGCTGAGT 
AGCAAGGAAGGGAATAGTAAACTTACTCCAGAGCTCCTTGTGCATCTGACCAGCACCA 
TTGAAAGAATAAACACAGAACTGGTTGGTTCCCCTGGGAAGCACCAACACCAAGAGGA 
GGGG AATGTGACTGTGAGGCCTTTCCCCAGACCCCAGAGCCTTGACCTTGGGGCTACC 
TT CACAGTGG ATGCCCACCAATTGG ATAAC CAGT CC CAGCCTCGTG ACCCTGGGCCTC 
AGT CAG CGTTTAGCCTACCAGGGTCCACC CAG CA CCTG CG CTC CCAGCTGTCACAATG 
CAAACAACG CTATCAAG ATCTCCAGG AGAAGCTG CTGCT ATCAGAAGCCACTGTCTTT 
GCTCAGGCTAACGAGCTGGAGAAATACAGAGTTATGCTTACAGGTGAATCCTTGGTGA 
AG CAGG ACAGCAAG CAGATCCAGGTGGACCTCCAGG ACCTGGGCTATGAG ACTTGTGG 
CCGAAG CGAG AATG AGGCTGAACGGG AGG AAACCACCAGTCCTG AGTGTGAGGAG CAC 
AACAGCCTCAAGGAAATGGTCCTGATGGAGGGGCTGTGCTCTGAGCAGGGACGCCGGG 
GCTCAACACTGGCTAGTTCCTCTGAGAGGAAGCCCTTGGAGAACCAGCTAGGGAAGCA 
GGAAGAGTTCCGGGTATATGGAAAGTCAGAAAACATCTTGGTCCTACGAAAGGACATC 
AAAG ATCTGAAGGCCCAG CTGCAGAATG CCAACAAGGT CATTCAAAACCTCAAG AGCC 
GGGTCCGGTCCCTCTCAGTTACAAGTGATTATTCGTCTAGTCTGGAAAGACCCCGGAA 
GCTG AG AG CTGTTGGC AC CTTGG AGGGGTCTTCACCTCATAGTGTCCCTGATG AGG AT 
GAGGGGTGGCTGTCTG ATGG CACTGGGGCTTT CTACTCTCCAGGGCTTCAGGCCAAAA 
AGGACCTGGAGAGTCTCATCCAG AGAGTATCCCAG CTGGAGGCC CAGCTCCCAAAAAA 
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TGG ACTAG AAG AG AAG CTGGCTG AGGAG CTGAG ATCAGCCTCGTGGCCTGGG AAAT AT 
GATTCCCTGATTCAGGATCAGGCCCGGGAACTGTCTTACCTACGGCAAAAAATACGAG 
AAGGGAGAGGTATTTGTTATCTTATCACCCGGCATGCAAAAGATACAGTAAAATCTTT 
TGAGGATCTCCTAAGGAGCAATGACATTGACTACTACCTGGGACAGAGCTTCCGGGAG 
CAACTCGCCCAGGGAAGCCAGCTGACAGAGAGGCTCACCAGCAAACTCAGCACCAAGG 
ATC AT AAAAGTG AGAAAG ATCAAG CTGG ACTTG AGCCACTGGCC CTCAGGCT CAGC AG 
GGAGCTGCAGGAGAAGGAGAAAGTGATTGAAGTCCTGCAGGCCAAGCTGGATGCTCGG 
TCCCTCACACCCTCCAGCAGCCATGCCTTGTCTGACTCCCACCGCTCTCCCAGCAGCA 
CCTCTTTCCTGTCTG ATGAACTGGAAGCCTG CTCTGACATGG ACAT AGTCAG CGAGTA 
CACACACTATGAAGAGAAGAAAGCTTCTCCCAGTCACTCAGATTCCATCCATCATTCG 
AGTCATTCTGCTGTGTTGTCTTCTAAACCATCATCAACCAGTGCATCTCAGGGGGCTA 
AGGCCGAATCCAACAGCAACCCCATCAGCTTGCCAACTCCCCAGAATACCCCCAAGGA 
GGCCAACCAGGCCCATTCAGGCTTTCATTTTCACTCCATACCCAAGCTGGCTAGCCTT 
CCTC^GGC^CCATTGCCCTCAGCTCCATCCAGCTTCCTGCCTITaVGCCCCACTGGCC 
CTCTCCTCCTTGGCTGCTGTGAGACACCAGTGGTCTCCTTGGCTGAGGCTCAGCAGGA 
GCTACAGATGCTGCAGAAGCAGTTGGGAGAAAGTGCCAGCACTGTTCCTCCTGCTTCC 
ACAGCTAC^TTGCTGAGCAACGACTTGGAAGCCGACTCTTCCTACTACCTCAACTCTG 
CC CAG CCT C ACTCT CCT C CAAGGGG CAC CAT AG AACTGG G AAG AATCCT AG AG CCTGG 
GTACCTGGGCAGCAGTGGCAAGTGGGATGTGATGAGGCCTCAGAAAGGGAGTGTATCT 
GGGGACCTATCCTCAGGCTCCTCTGTGTACCAGCTTAACTCCAAACCCACAGGGGCTG 
ACCTGCTGGAAGAGCATCTTGGTGAAATCCGGAACCTGCGCCAGCGCCTGGAGGAGTC 
CATCTGCATCAATGACCGCCTACGGGAGCAACTGGAACACCGGCTGACCTCTACTGCT 
CGTGGAAGGGGATCCACTTCTAACTTCTACAGTCAGGGCCTGGAGTCCATACCTCAGC 
TCTCCAATGAGAACAGAGTCCTCAGGGAAGACAATCGAAGACTTCAGGCTCAACTGAG 
TC ATGTTTCCAGAG AGCACTCC CAGGAAACAGAAAGCCTG AGGG AGG CTCTGCTGTCC 
TCTCG AT CCCACCTTCAAGAGCTGG AAAAGGAG CTGGAGCACCAG AAGGTGG AAAGGC 
AG CAG CTTTTGG AAGACTTG AGG G AG AAG CAG CAAG AGG TCTTG CATTT CAGGG AGG A 
ACGTCTTTCCCTCCAGGAAAACGACTCCAGACTGCAGCACAAGCTGGTTCTCCTGCAG 
CAACAGTGTG AAGAGAAACAG CAG CTCTTTGAGTCCCTCCAGTCAG AGC TAC AAATCT 
ACGAGGCACTTTATGGCAATTCCAAGAAGGGGCTGAAAGGCTTGGGTTTGGATACTTC 
TCCAGTAATGAAGACCCCTCCCAAGCTAGAGGGTGATGCTACTGATGGCTCCTTTGCC 
AATAAGCATGGCCGCCATGTCATTGGCCACATTGATGACTACAGTGCCCTAAGACAGC 
AGATTGCGGAGGG CAAG CTGCTGGTCAAAAAG ATAGTGTCTCTTGTGAGATCAGCGTG 
CAGCTT C C CTGG CCTTGAAGCC C AAGG C A CAGAGG TG CTAGG CAG C AAAGGT ATT CAT 
GAGCTTCGGAGCAGCACCAGTGCCCTGCACCATGCCCTAGAGGAGTCGGCTTCCCTCC 
TCACCATGTTCTGGAGAGCAGCCCTGCCAAGCACCCACATCCCTGTGCTGCCTGGCAA 
AGTGGGAGAATt^ACAGAAAGGGAACTTCTGGAACTGAGAACCAAAGTATCCAAACAG 
GAGCGGCTCCTTCAGAGCACAACTGAGCATCTGAAGAACGCCAACCAGCAGAAGGAGA 
GCATGG AG CAGTT CAT CGT CAG C C AG CT AACC AG AACAC ATG ATGTTTT AAAG AAGG C 
AAGGACTAACTT AG AGGTG AAATCCCT AAGGG CT CTGCCATGTACTCCAGCCTTGTGA 
CCCTTGCCTTCCAGGAACC ATG CAAG AAGCGCAG CCACCAG AAGTCCTTAAAACAGCA 


GGAAAGGTGGGCCTGTCCCCCTTTTGTGCAGCTACCTATCTGCTGAGGAGCATCTGGG 


CCTCATTCCTCCAAGTCCACGGGAGGGTCCAGAAGAGGGAGTCAGAGATGTATCCTGG 


TGGAGCTGGGAGAAAGGCAGAAAGCCTTTCTGACAGCTATGGAATACGATTAGCCAAG 


GTCCACTTGGCCCAGCACTAAGAAAAAGATGCGTAGTTTGCACAGAAGGTTTTGTGAT 


CCrcCCTCTCAACAGCCCCAGCAGCTTGGGAACTAGCAAGAGCACATTTCTTGCCTCA 


TCAGCTCTCCTGAGATGGAAAACTCAGTGGATATAGGACCCTGATTCCGATGAAAGGG 


GCACGTGGTCCCAATGCTGGAGCTCCTCTGGCAGGTTCTAAAAGCACACTACTGAGCA 


GCGGTGCCCTGCCGGACACTGCTGGCGGGGGCTCAGTGAGCACTACTCACAGATCCAC 


ACCTGACCCTGTTGGGTCGAGTCAGGCTGGGCCTTGGTCTGCACTGTAGCACCTGTGT 


TCTTTGAGTTCACATCATGAATGTGGTGACTTCCCAGATACCATCTCAGGCTTAACCT 


AGCACATCCTATTTCTTTTCTTCTATGATATCCAAATTGGACTGACCTCACTTCAAAG 


TTGCTGTCCCATTTTGTCACCCTATCTTATCTCGGGGAAATTGCAGACTGATGGCCAG 


ACCAACTCIX3TTGAAATT CTTGCAT AG AGCAAACCTGTGCTC^TTTTT AAGTGG CATG 


GGAGAGGCCCCCAGCCTAGTAAAGCCTAGTCTGTGTCTTCACAGTGCTGGTAGAATGT 


GTTTGTGTGTATAAATATATGATATAGATTTATATATGTTGCTAACGCCATATATTGA 


AGGCCAACATAACTGGTGGACAGGGTGGGTGACAGAAAATGAAAGCCTTTTTGGTGAT 


TGTTAAAGCAAGATGTGTATAAAGAAATAAATAGTTTTTCTTTC 




ORF Start: ATG at 658 


ORF Stop: TGA at 7828 




SEQ ID NO: 292 


2390 aa MW at 268843.7kD 


NOV 103a, 

CG59773-01 Protein Sequence 


MKE I CRI CARELCGNQRRWI FHTASKUJLQVLLS HVLGKDVPRDGKAEFACS KCAFML 
DR I YRFDTVI ARI EALS I ERLQKLLLEKDRLKFC I ASMYRKNNDDSGAE I KAGNGTVD 
MS VLPDARYSALLQEDFAYSGFECWVENEDQI QE PHSCHGSEGPGNRPRRCRGCAALR 
VADSDYEAICKVPRKVARS I SCGPSSRWSTSICTEEPALSEVGPPDLASTKVPPDGES 
MEEETPGSSVESLDASVQASPPQQKDEETERSAKELGKCDCCSDDQAPQHGCNHKLEL 
ALSMIKGLDYKPIQSPRGSRLPIPVKSSLPGAKPGPSMTDGVSSGFLNRSLKPLYKTP 
VSYPLELSDLQELWDDIXEDYLPLRVQPMTEEIiLKQQKLNSHETTI'rQQSVSDSHLAE 
LQEKIQQTEATNKILQEKLNEMSYELKCAQESSQKQDGTIQNLKETLKSRERETEELY 
QV I EGQNDTMAKLREMLHQSQLGQLH S S EGTS P AQQQVALLD LQ S ALFCS QLEI QKLQ 
RVWQKERQLADAKQCVQFVEAAAHESECXJKEASWKHNQELRKALQQLQEEUJNKSQQ 
LRAWEAEKYNEI RTQEQN I QHLNHSLSHKEQLLQEFRELLQYRDNSDKTLEANEMLLE 
KLRQR I HD KA VALE RA I DE KFS AL E E ICE KE LRQLRLAVRE RDHDLE RLRD VLSSNE AT 
MQSMESLLRAKGLEVEQLSTTCQNLQWLKE EMETKFSRWQKEQES I IQQLQTSLHDRN 
KEVEDLSATLLCKLGPGQSEIAEELCQRXiQRKERMIiQDLLSDRNKQVLEHEMEIQGIiL | 
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QSVSTREQESQAAAEKLVQALMERNSELQALRQYLGGRDSLMSQAPISNQQAEVTPTG 
RLGKQTDQGSMQI PSRDDSTSLTAKEDVS I PRSTLGDLDTVAGLEKELSNAKEELELM 
AKKERESQMELSALQSMMAVQEEELQVQAADMESLTRNIQIKEDLIKDLQMQLVDPED 
IPAMERLTQEVLLLREKVASVESQGQEISGNRRQQLLLMLEGLVDERSRLNEALQAER 
QLYSSLVKFHAHPESSERDRTLQVELEGAQVLRSRLEEVtXjRSLERLNRLETLAAIGG 
AAAGDDTEDTSTEFTDSIEEEAAHHSHQQLVKVALEKSIiATVETQNPSFSPPSPMGGD 
SNRCLQEEMLHIiRAEFHQHLEEKRKAEEELKELKAQIEEAGFSSVSHIRNTMLSLCLE 
NAELKEQMGEAMSDGWEIEEDKEKGEVMVETWTKEGLSESSLQAEFRKLQGKLKNAH 
NIINLLKEQLVLSSKEGNSKLTPELLVHLTSTIERINTELVGSPGKHQHQEEGNVTVR 
PFPRPQSLDLGATFTVDAHQLDNQSQPRDPGPQSAFSLPGSTQHLRSQLSQCKQRYQD 
LQEKLLLSEATVFAQANELEKYRVMLTGESLVKQDSKQIQVDLQDLGYETCGRSENEA 
EREETTSPECEEHNSLKEMVLMEGLCSEQGRRGSTlASSSERKPLENQIiGKQEEFRVY 
GKSENILVLRKDIKI3LKAQLQNANKVIQNLKSRVRSLSVTSDYSSSLERPRKLRAVGT 
LEGSSPHSVPDEDEGWLSDGTGAFYSPGLQAKKDLESLIQRVSQLEAQLPKNGLEEKL 
AEELRSAS WPGKYDSLI QDQARELS YLRQKI REGRG I CYLI TRHAKDTVKS FEDLLRS 
NDIDYYLGQSFREQLAQGSQLTERLTSKLSTKDHKSEKDQAGLEPLALRLSRELQEKE 
KVIEVLQAKLDARSLTPSSSHALSDSHRSPSSTSFLSDELEACSDMDIVSEYTHYEEK 
KASPSHSDSIHHSSHSAVLSSKPSSTSASQGAKAESNSNPISLPTPQNTPKEANQAHS 
G FH FHS I P KLAS LPQAP L PSAPSSFLPFS PTG P LLLGCC ET PWS LAEAQQELQMLQK 
QLGESASTVPPASTATLLSNDLEADSS YYLNSAQPHS PPRGTI ELGRI LEPGYLGSSG 
KWDVMRPQKGSVSGDLSSGSSVYQLNSKPTGADLLEEHLGEI RNLRQRLEES ICINDR 
LREQLEHRLTSTARGRGSTSNFYSQGLESIPQLCNENRVLREDNRRLQAQLSHVSREH 
SQETESLREALLSSRSHLQELEKELEHQKVERQQLLEDLREKQQEVLHFREERLSLQE 
tTOSRI^HKLVLLQQQCEEKQQLFESLQSELQIYEALYGNSKKGLKGLGIiDTSPVMKTP 
PKLEGDATDGSFANKHGRHVIGHIDDYS ALRQQ I AEGKLLVKKI VS LVRS ACS FPGLE 
AQGTEVLGSKGIHELRSSTSALHHALEESASLLTMFWKAALPSTHIPVLPGKVGESTE 
RELLELRTKVSKQERLLQSTTEHLKNANQQKESMEQFIVSQLTRTHDVLKKARTNLEV 
KSLRALPCTPAL 




SEQ ID NO: 293 


7161 bp 


NOV103b, 

CG59773-02 DNA Sequence 


GTTGAGGGGGCAATCGGGCACGCTCCTCCCCATGGGTTGCCCATCATGTCTAATGGAT 


ATCGCACTCTGTCCCAGC^CCTCAATGACCTGAAGAAGGAGAACTTCAGCCTCAAGCT 
G CG CATCT ACTTCCTGGAGG AG CGC ATGCAACAGAAGTATG AGGCCAGCCGGG AGGAC 
AT CTACAAGCGG AACATTG AG CTG AAGGTTG AAGTGGAGAG CTTGAAACGAGAACTCC 
AGGACAAGAAACAGCATCTGGATAAAACATGGGCTGATGTGGAGAATCTCAACAGTCA 
G AATGAAGCTGAGCTCCGACGCCAGTT TGAGGAGCGACAG CAGGAGACGGAGCATGTT 
TATGAGCTCTTGGAGAATAAGATCCAGCTTCTGCAGG AGG AATCCAGG CTAG C AAAGA 
ATG^GCTGCGCGGATGGCAGCTCTGGTGGAAGCAGAGAAGGAGTGTAACCTGGAGCT 
CTCAG AG AAACTG AAGGG AGT C ACCAAAAACTGGGAAGATGT ACCAGG AGAC CAGGTC 
AAGCCCGACCAATACACTGAGGCCCTGGCCCAGAGGGACAGGAGAATTGAAGAACTGA 
ATC AGAGCCTGGCTGC CCAGGAGAGG CTTGTAG AACAG CTATCTCGGGAGAAACAACA 
ACTGCTACATCTGTTGGAGG AGCCAACTAGCATGGAAGTG CAG CCCATG ACTG AAG AG 
TTGCTGAAACAACAAAAGCTGAATTCACATGAGACCACTATAACTCAGCAGTCTGTAT 
CTGATTCCCACTTGGCAGAACTCCAGGAAAAAATCCAGCAAACAGAGGCCACCAACAA 
GATTCTTCAAGAGAAACTTAATGAAATGAGCTATGAACTAAAGTGTGCTCAGGAGTCG 
T CT CAAAAGCAAGATGGTACAATTCAGAACCTCAAGG AAACTCTG AAAAG CAGGGAAC 
GTGAGACTGAGGAGTTGTACCAGGTAATTGAAGGTCAAAATGACACAATX3GCAAAGCT 
TCG AGAAATG CTGCAC CAAAGCCAGCTTGG ACAACTTC AG AGCTCAGAGGGT ACTTCT 
CCAGCTCAGCAACAGGTAGCTCTGCTTGATCTTCAGAGTGCTTTATTCTGCAGCCAAC 
TTGAAATACAGAAGCTCCAGAGGGTGGTACGACAGAAAGAGCGCCAACTGGCTGATGC 
CAAACAATGTGTGCAATTTGTAGAGGCTGCAGCACACGAGAGTGAACAGCAGAAAGAG 
GCTTCTTGGAAACAT AACCAGG AATTGCGAAAAGCCTTG CAG CAGCTACAAG AAG AAT 
TGCAGAATAAGAGCCAACAGCTTCGTGCCTGGGAGGCTGAAAAATACAATGAGATTCG 
AACCCAGGAACAAAACATCCAGCACCTAAACCATAGTCTGAGTCACAAGGAGCAGTTG 
CTTCAGGAATTTCGGGAGCTCCTACAGTATCGAGATAACTCAGACAAAACCCTTGAAG 
CAAATGAAATGTTCCTTGAGAAACTTCGCCAGCGAATACATGATAAAGCT<3TTGCTCT 
GGAGCGGGCTATAGATGAAAAATTCTCTGCTCTAGAAGAGAAAGAAAAAGAACTGCGC 
CAGCTTCGTCTTGCTGTGAGAGAGCGAGATCATGACTTAGAGAGACTGCGCGATGTCC 
TCTCCTCCAATGAAGCTACTATGCAAAGTATGGAGAGTCTCCTGAGGGCCAAAGGCCT 
GGAAGTGGAACAGTTATCTACTACCTGTCAAAACCTCCAGTGGCTGAAAGAAGAAATG 
GAAACCAAATTTAGCCGTTGGCAGAAGGAACAAGAGAGTATCATTCAGCAGTTACAGA 
CGTCTCTTTCATGATAGGAACAAAGAAGTGGAGGATCTTAGTGCAACACTGCTCTGCAA 
ACTTGGACCAGGGCAGAGTGAGATAGCAGAGGAGCTGTGCCAGCGTCTACAGCGAAAG 
GAAAGGATGCTGCAGGACCTTCTAAGTGATCGAAATAAACAAGTGCTGGAACATGAAA 
TGG AG ATTCAAGGCCTGCTTCAGTCTGTGAGCACCAGGG AG C AGG AAAGC CAAGCTG C 
TGCAGAGAAGTTGGTGCAAGCCTTAATGGAAAGAAATTCAGAATTACAGGCCCTGCGC 
CAATATTTAGGAGGGAGAGACTCCCTGATGTCCCAAGCACCCATCTCTAACCAACAAG 
CTGAAGTTACCCCCACTGGCCGTCTTGGAAAACAGACTGATCAAGGTTCAATGCAGAT 
ACCTTCCAGAGATGATAGCACTTCATTGACTGCCAAAGAGGATGTCAGCATACCCAGA 
TCCAC^TTAGGAGATTTGGACACAGTTGCAGGGCTGGAAAAAGAACTGAGTAATGCCA 
AAGAGGAACTTGAACTCATGGCTAAAAAAGAAAGAGAATCACAGATGGAACTTTCTGC 
TCTACAGTCCATGATGGCTGTGCAGGAAGAAGAGCTGCAGGTGCAGGCTGCTGATATG 
GAGTCTCTGACCAGGAACATACAGATTAAAGAAGATCTCATAAAGGACCTGCAAATGC 
AACTGGTTGATCCTGAAGACATACCAGCTATGGAACGCCTGACCCAGGAAGTCTTACT 
TCTTCGGGAAAAAGTTGCTTCAGTAGAATCCCAGGGTCAAGAAATTTCAGGAAACCGA 
AG ACAACAGCAGTTGCTG CTG ATGCT AGAAGGACT AGTAG ATGAACGGAGT CGGCTCA 
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ATGAGGCCTTACAAGCAGAGAGACAGCTCTATAGCAGTCTGGTGAAGTTCCATGCCCA 

TCCAGAGAGCTCTGAGAGAGACCGAACTCTGCAGGTGGAACTGGAAGGGGCTCAGGTG 

TTACGCAGTCGGCTAGAAGAAGTTCTTGGAAGAAGCTTGGAGCGCTTAAACAGGCTGG 

AG AC CCTGGCCGCCATTGGAGGTGCAGCTG CAGGGG ATGACACCG AAGATACAAG CAC 

TGAGTTCACTGACAGTATTGAGGAGGAGGCTGCACACCATAGTCACCAGCAACTTGTC 

AAGGTGGCTTTGGAGAAAAGTCTGGCAACTGTGGAGACCCAGAACCCATCTTTTTCCC 

CTCCTTCTCCGATGGG AGGGG ACAGT AACAGGTGTCTTCAGGAAGAAATG CTC CACCT 

GAGGGCTGAGATCCACCAGCACTTAGAAGAGAAGAGGAAAGCTGAGGAGGAACTGAAG 

GAGCT AAAGGCTCAAATTG AGG AAG CAGG ATT CT CCTCAGTGTCC CACATCAGGAACA 

CCATGCTGAGCCTTTGCCTTGAGAATGCGGAGCTGAAAGAGCAGATGGGAGAAACAAT 

GTCTG ATGGATGGG AGAT CG AGGAAGACAAGGAGAAGGG CGAGGTGATGGTTG AG ACT 

GTGGTAACCAAAGAGGGTCTGAGTGAGAGTAGCCTTCAGGCTGAGTTCAGAAAGCTCC 

AGGGAAAACTGAAGAATGCCCACAATATCATCAACCTCCTCAAAGAACAACTTGTGCT 

GAGTAGCAAGGAAGGGAATAGTAAACTTACTCCAGAGCTCCTTGTGCATCTGACCAGC 

ACCATCGAAAGAATAAACACAGAACTGGTTGGTTCCCCTGGGAAGCACCAACACCAAG 

AGGAGGGGAATGTGACTGTGAGGCCTTTCCCCAGACCCCAGAGCCTTGACCTTGGGGC 

TACCTTCACAGTGGATGCCCACCAACAGTTGGATAACCAGTCCCAGCCTCGTGACCCT 

GGGCCTCAGCCAGCGTTTAGCCTACCAGGGTCCACCCAGCACCTGCGCTCCCAGCTGT 

CACAATGCAAACAACGCTATCAAGATCTCCAGGAGAAGCTGCTGCTATCAGAAGCCAC 

TGTCTTTGCTCAGGCTAACG AG CTGGAGAAATACAGAGTTATGCTT AGTG AAT CCTTG 

GTG AAG CAGG AC AG C AAG C AGATCCAGGTGG ACTTCC AGGAC CTGGGCTATGAG ACTT 

GTGGC CGAAGCGAGAATG AGG CTG AACGGG AGGAAACCACCAG t CCTGAGTGTGAGG A 

GCACAACAGCCTCAAGGAAATGGTCCTGATGGAGGGGCTGTGCTCTGAGCAGGGACGC 

CGGGGCTCAACACTGGCT AGTTCCTCTGAGAGG AAG CCCTTGGAG AACCAG CT AGGG A 

AGCAGGAAGAGTTCCGGGTATATGGAAAGTCAGAAAACATCTTGGTCCTACGAAAGGA 

CAT CGAAG AT CTG AAGG C C CAG CTGCAG AATG C C AAC AAGGT CATT C AAAA CCT C AAG 

AGCCGGGTCCGGTCC CTCTCAGTTACAAGTG ATTATT CGTCTAGTCTGG AAAG ACCCC 

GGAAGCTG AGAGCTGTTGG CACCTTGG AGGGGT CTTCACCTCATAGTGTC C CTGATG A 

GGATGAGGGGTGGCTGTCTGATGGCACTGGGGCTTTCTACTCTCCAGGGCTTCAGGCC 

AAAAAGGACCTGGAGAGTCTCATCCAGAGAGTATCCCAGCTGGAGGCCCAGCTCCCAG 

AAAATGGACTAG AAGAG AAG CTGGCTGAGGAGCTGAG ATCAGC CTCGTGGCCTGGGAA 

ATATGATTCCCTGATTCAGGATCAGGCCCGGGAACTGTCTTACCTACGGCAAAAAATA 

CGAG AAG GG AG AGGT ATTTGTT ATCTT AT CAC C CAG CATG CAAAAG AT ACAGT AAAAT 

CTTTTG AGG ATC TC CT AAGGAG C AATG ACATTG A CT ACT A C CTGGG ACAG AG CTTCCG 

GGAGCAACTCGCCCAGGGAAGCC^GCTGACAGAGAGGCTCACCAGCAAACTCAGCACA 

GAGG ATCATAAAAGTG AGAAAGATCAAGCTGGACTTGAG C CACTGGCC CTCAGGCTCA 

GCAGGGAGCTGCAGGAGAAGGAGAAAGTGATTGAAGTCCTGCAGGCCAAGCTGGATGC 

TCGGTCCCTCACACCCTCCAGCAGCCGTGCCTTGTCTGACTCCCACCGCTCTCCCAGC 

AGCACCTCTTTCCTGTCTGATGAGCTGGAAGCCTGCTCTGACATGGACATAGTCAGCG 

AGTACACACACTATGAAGAGAAGAAAGCTTCTCCCAGTCACTCAGGTAGCAGTGCATC 

TCAGGGGGCTAAGGCCGAATC CAACAG C AACCCCATC AGCTTGCCAACTCC CCAGAAT 

ACCCCCAAGGAGGCCAACCAAGCCCATTCAGGCTTTCATTTTCACTCCATACCCAAGC 

TGGCTAGCCTTCCTCAGGCACCATTGCCCTCAGCTCCATCCAGCTTCCTGCCTTTCAG 

CCCCACTGGCCCTCCCCTCCTTGGCTGCTGTGAGACACCAGAGGTCTCCTTGGCTGAG 

TCT CAG C AGGAGCT ACAGATG CTG CAG AAGCAGTTGGG AGAAAGT AG CACTGTTC CTC 

CTGCTTCCACAGCTACATTGCTGAGCAACGACTTGGAAGCCGACTCTTCCTACTACCT 

CAACT CTGC C CAG CCTCACTCT C CTCCAAGGGG CAC CAT AG AACTGGG AAG AATCCT A 

GAGCCTGGGTACCTGGGCAGCAGTGGCAAGTGGGATGTGATGAGGCCTCAGAAAGGGA 

GTGTATCTGGGGACCTATCCTCAGGCTCCTCTGTGTACCAGCTTAACTCCAAACCCAC 

AGGGG CTGACCTGCTGGAAGAGCATCTTGGTGAAATCTGGAACCTGCGCCAGCGCCTG 

GAGGAGTCCATCTGCATCAATGACTGCCTACGGGAGCAACTGGAACACCGGCTGACCT 

CTACTGCTCGTGGAAGGGGATCCACTTCTAACTTCTACAGTCAGGGCCTGGAGTCCAT 

ACCTCAGCTCTGCAATGAGAACAGAGTCCTCAGGGAAGAAAATCGAAGACTTCAGGCT 

CAACTGAGT CATGTTTCCAGAGGTCACTCCCAGGAAACAGAAAG CCTGAGGGAGG CTC 

TG CTGTC CT CT CG AT CC CAC CTT CAAG AG CTGG AAAAGG AG CTGG AG C ACC AG AAGGT 

GG AAAGG CAG CAGCTTTTGGAAG ACTTGAGGGAGAAG C AGCAAGAGGTCTTGCATTTC 

AGGGAGGAACGTCTTT CCCTCCAGGAAAACGACTC CAGACTGCAGC ACAAG CTGGTTC 

TCCTGCAGCAACAGTGTGAAGAGAAACAGCAGCTCTTTGAGTCCCTCCAGTCAGAGCT 

ACAAATCTACGAGGCACTTTATGGCAATTCCAAGAAGGGGCTGAAAGCTTACAGCCTG 

GATGCCTGTCACCAAATCCCTTTGAG CAGTG AC CTGAGCCACCTGGTGG CAGAGGTAC 

AAGCTCTGAGAGGGCAGCTGGAGCAGAGCATTCAGGGGAACAATTGTCTGCGACTGCA 

GCTGCAACAGCAGCTGGAGAGCGGTGCTGGCAAAGCCAGCCTCAGCCCCTCCTCCATT 

AACCAGAACTTCC C AGCCAG CACTG ACCCTGGAAACAAG CAGCTG CT CCTCCAAGGTT 

CAGCTGTGTCCCCTCCAGTCCGGGATGTTGGTATGAATTCCCCAGCTCTGGTCTTCCC 

CAGCTCTGCTTCCTCTACTCCTGGCTCAGATTCAGTTGTGTTGTCATTTTCTTTTTCA 

GG CTTGGGTTTGGATACTTCTCCAGTAATGAAGACCCCTCCCAAG CT AGAGGGTGATG 

CTACTGATGGCTCCTTTGCCAATAAGCATGGCCGCCATGTCATTGGCCACATTGATGA 

CTACAGTG CCCTAAGACAGCAGATTGCGGAGGG CAAGCTGCTGGTCAAAAAGATAGTG 

TCTCTTGTGAGATCAGCGTGCAGCTTCCCTGGCCTTGAAGCCCAAGGCACAGAGGGCA 

GCAAAGGCATTCATGAGCTTCGGAGCAGCACCAGTG CCCTG CACCATGCCCTAGAGGA 

GTCGGCTTCCCTCCTCACCATGTTCTGGAGAGCGGCCCTGCCAAGCACCCACATCCCT 

GTGCTGCCTGGCAAACAGGGAGAATCAACAGAAAGGGAACTTCTGGAACTGAGAACCA 

AAGTATCCAAACAGGAGCAGCTCCTTCAGAGCACAACTGAGCATCTGAAGAACGCCAA 

CCAGCAGAAGGAGAGCATGGAACAGTTCATTGTCAGCGTAACCAGAACACATGATGTT 

TT AAAG AAGG C AAGG ACT AACTTAG AGGTG AAAT C CCT AAGGGCT CTGC CGT GT A CT C 

CAG CCTTG T QAC C CTTG C CTT C CAGG AAC CATG CAAG AAG CG CAG C CAC CAG AAG TC C 

TTAAAACAG CAGGAAAGGTGAGCCTGTCCCCCTTTTGTG CAG CTACCTATCTG CTGAG 
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GAGCATCTGGGCCTCATTCCTCCAAGT 




ORF Start: ATG at 46 


ORF Stop: TGA at 7027 . 




SEQIDNO: 294 


2327 aa MW at 263034.6kD 


NOV 103b, 

CG59773-02 Protein Sequence 


MSNG YRTLSQHLNDLKKENFS LKLRI YFLEERMQQKYEASREDI YKRN I ELKVEVESL 
KRE LQD KKQHLD KTW ADVENLNS QNEAELRRQFE ERQQ ET EHVYELLENKI QLLQE E S 
RLAKNEAARMAALVEAE KECNLE L S E KLKGVTKNWED V PGD Q VKPDQYT EALAQRDRR 
I E ELNQS LAAQE RLVEQ LSRE KQQ LLHLLE E PTS M E VQ P MTE ELLKQQKLN SHETT I T 
QQ S VS DS H LAE LQE KI QQTEATN K I LQE KLNEMS YE LKCAQE S SQKQDGT I QNLKETL 
KSRERETEELYQVI EGQNDTMAKLREMLHQSQLGQLQSS EGTS PAQQQVALLDLQS AL 
FCSQLEIQKI^RVWQKERQIiADAKQCVQFVEAAAHESEQQKEASWKHNQELRKALQQ 
liQEELQNKSQQIiRAWEAEKYNE I RTQEQNI QHLNHS LSHKEQLLQEFRELLQYRDNSD 
KTLEANEMLLEKLRQRIHDKAVALERAIDEKFSALEEKEKELRQLRIiAVRERDHDLER 
LRDVLSSNEATMQSMESLLRAKGLEVEQLSTTCQNLQWLKEEMETKFSRWQKEQESII 
QQLQTSXJiDRNKEVEDLSATLLCKIiGPGQSEZAEELCQRLQRKERMIjQDLLSDRNKQV 
LEHEME I QGLLQS VSTREQESQAAAEKLVQALMERNSELQALRQYLGGRDS LMS QAP I 
SNQQAEVTPTGRLGKQTDQGSMQI PSRDDSTSLTAKEDVS I PRSTLGDLDTVAGLEKE 
LSN AKE E LE LMAKKERE S QME LS ALQSMMAVQEE E LQ VQ AAOMES LTRN I Q I KEDLI K 
DLQMQLVDPEDIPAMERLTQEVLLLREKVASVESQGQEISGNRRQQQLLLMLEGLVDE 
RS RLNEALQAERQ L YS S L VKFHAH PES S ERDRTLQ VE LEGAQ VLRS RLEE VLGRS LER 
LNRLETLAAIGGAAAGDDTEDTSTEFTDSIEEEAAHHSHQQLVKVALEKSLATVETQN 
PSFSPPSPMGGDSNRCLQEEMLHLRAEIHQHLEEKRKAEEELKELKAQIEEAGFSSVS 
HIRNTMLSLCLENAELKEQMGETMSDGWEIEEDKEKGEVJ^TVVTKEGLSESSLQAE 
FRKLQGKLKNAHN 1 1 NLLKEQL VLS S KEGNS KLT P E LLVHLTS T I ER I NT E L VG S PGK 
HQHQEEGNVTVRPFPRPQSLDLGATFTVDAHQQLDNQSQPRDPGPQPAFSLPGSTQHL 
RSQLSQCKQRYQDLQEKLLLSEATVFAQANELEKYRVMLSESLVKQDSKQIQVDFQDL 
GYETCGRSENEAEREETTSPECEEHNSLKEMVLMEGLCSEQGRRGSTLASSSERKPLE 
NQLGKQEE FRVYGKSENI LVLRKD I EDLKAQLQNANKVI QNLKS RVRS LS VTSDYS S S 
LERPRKLRAVGTLEGSSPHSVPDEDEGWLSDGTGAFYSPGLQAKKDLESLIQRVSQLE 
AQLPENGLEEKLAEELRSASWPGKYDSLIQDQARELSYIiRQKIREGRGICYLITQHAK 
DTVKSFEDLLRSNDIDYYLGQSFREQLAQGSQLTERLTSKLSTEDHKSEKDQAGLEPL 
ALRLSRELQEKEKVIEVLQAKLDARSLTPSSSRALSDSHRSPSSTSFLSDELEACSDM 
DIVSEYTHYEEKKASPSHSGSSASQGAKAESNSNPISLPTPQNTPKEANQAHSGFHFH 
SIPKLASLPQAPLPSAPSSFLPFSPTGPPLLGCCETPEVSLAESQQELQMLQKQLGES 
STVP PASTATLLSNDLEAD S S YYLNSAQPHS P PRGT I E LGRI LE PGYLGSSGKWDVMR 
PQKGSVSGDLSSGSSVYQLNSKPTGADLLEEHLGEIWNLRQRLEESICINDCLREQLE 
HRLTSTARGRGSTSNFYSQGLES I PQLCN ENR VLRE ENRRLQ AQLSHVS RGHSQETES 
LREALLSSRSHLQELEKELEHQKVERQQLLEDLREKQQEVLHFREERLSLQENDSRLQ 
HKLVLLQQQCEEKQQLFESLQSELQIYEALYGNSKKGLKAYSLDACHQIPLSSDLSHL 
VAEVQALRGQLEQSIQGNNCLRLQLQQQLESGAGKASLSPSSINQNFPASTDPGNKQL 
LLQGSAVSPPVRDVGMNSPALVFPSSASSTPGSDSWLSFSFSGLGLDTSPVMKTPPK 
LEGDATDGS FANKHGRHVIGHIDD YSALRQQI AEGKLLVKKI VSLVRSACS FPGLEAQ 
GTEGSKGIHELRSSTSALHHALEESASLLTMFWRAALPSTHIPVLPGKQGESTERELL 
ELRTKVSKQEQLLQSTTEHLKNANOQKESMEQFIVSVTRTHDVLKKARTNLEVKSLRA 
LPCTPAL 




SEQ ID NO: 295 


7084 bp 


NOV103c, 

CG59773-03 DNA Sequence 


GTTGAGGGGGCAATCGGGCACGCTCCTCCCCATGGGTTGCCCATCATGTCTAATGGAT 


ATCGCACTCTGTCCCAG CACCTCAATGACCTG AAGAAGGAGAACTTCAGC CTCAAGCT 


G CTCATCT A CTT C CTGG AGG AG CG C ATG CAACAGAAG T ATGAGGC CAG CCGG G AGG AC 


ATCTACAAG CGGGGGTG ATGTGGAGAATCTCAACAGT CAGAATGAAGCTG AGCTCCG A 
CGCCAGTTTGAGGAGCGACAGCAGGAGACGGAGCATGTTTATGAGCTCTTGGAGAATA 
AGAT CCAGCTT CTGCAGGAGGAATCCAGGCTAG CAAAG AATG AAG CTG CG CGGATGGC 
AGCTCTGGTGG AAGCAG AG AAGG AGTGTAACCTGG AG CTCT CAGAGAAACTGAAGGG A 
GTCACC AAAAACTGGGAAGATGTACCAGGAG ACCAGGT CAAGCCCG AC C AATACACTG 
AGAC CCTGG CCCAGAGGGACAAGAGAATTGAAGAACTG AATCAG AGCCTGG CTGCCCA 
GGAGAGGCTTGTAGAACAGCTATCTCGGGAGAAACAACAACTGCTACATCTGTTGGAG 
G AGCCAACTAGCATGGAAGTG CAGCCCATG ACTG AAG AGTTG CTGAAACAACAAAAGC 
TG AATTCACATG AGACCACTATAACTCAGCAGTCTGTATCTGATT CCCACTTGG CAGA 
ACTCCAGGAAAAAATCCAGCAAACAGAGGCCACCAACAAGATTCTTCAAGAGAAACTT 
AATGAAATGAGCTATGAACTAAAGTGTGCTCAGGAGTCGTCTCAAAAGCAAGATGGTA 
CAATTCAGAACCTCAAGGAAACTCTGAAAAGCAGGGAACGTGAGACTGAGGAGTTGTA 
CCAGGTAATTGAAGGTCAAAATGACACAATGGCAAAGCTTCGAGAAATGCTGCACCAA 
AG CCAGCTTGG ACAACTTCAGAGCTCAGAGGGTACTTCTCCAG CTCAG CAACAGGT AG 
CTCTGCTTGATCTTCAGAGTGCTTTATTCTGCAGCCAACTTGAAATACAGAAGCTCCA 
GAGGGTGGTACGACAGAAAGAGCGCCAACTGGCTGATGCCAAACAATGTGTGCAATTT 
GTAGAGGCTGCAGCACACGAGAGTGAACAGCAGAAAGAGGCTTCTTGGAAACATAACC 
AGGAATTGCG AAAAGCCTTGCAGC AG CTACAAGAAGAATTG CAGAAT AAG AGCCAACA 
GCTTCGTGC CTGGGAGG CTGAAAAAT ACAATGAGATT CGAACC C AGG AAC AAAAC ATC 
CAGCACCTAAACCATAGTCTGAGTCACAAGGAGCAGTTGCTTCAGGAATTTCGGGAGC 
TCCTACAGTATCGAGATAACTCAGACAAAACCCTTGAAGCAAATGAAATGTTGCTTGA 
GAAACTTCGCCAGCGAATACATGATAAAGCTGTTGCTCTGGAGCGGGCTATAGATGAA 
AAATTCTCTGCTCTAGAAGAGAAAGAAAAAGAACTGCGCCAGCTTCGTCTTGCTGTGA 
GAGAGCGAGATC^TGACTTAGAGAGACTGCGCGATGTCCTCTCCTCCAATGAAGCTAC 
TATGCAAAGTATGGAGAGTCTCCTGAGGGCCAAAGGCCTGGAAGTGGAACAGTTATCT 
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ACTACCTGTCAAAAC CTCCAGTGGCTGAAAG AAGAAATGG AAAC CAAATTTAGCCGTT 
GG CAGAAGGAACAAGAGAGT AT CATTCAGC AGTTACAG ACGTCTCTTCATGATAGG AA 
CAAAGAAGTGGAGGATCTTAGTGCAACACTGCTCTGCAAACTTGGACCAGGGCAGAGT. 
GAGATAGCAGAGGAGCTGTGCCAGCGTCTACAGCGAAAGGAAAGGATGCTGCAGGACC 
TTCTAAGTGATCGAAATAAACAAGTGCTGGAACATGAAATGGAGATTCAAGGCCTGCT 
TCAGTCTGTGAGCACCAGGGAGCAGGAAAGCCAAGCTGCTGCAGAGAAGTTGGTGCAA 
GCCTTAATGGAAAGAAATTCAGAATTACAGGCCCTGCGCCAATATTTAGGAGGGAGAG 
ACTCCCTGATGTCCCAAGCACCCATCTCTAACCAACAAGCTGAAGTTACCCCCACTGG 
CCGTCTTGGAAAACAGACTGATCAAGGTTCAATGCAGATACCTTCCAGAGATGATAGC 
ACTTCATTGACTGCCAAAGAGGATGTCAGCATACCCAGATCCACATTAGGAGATTTGG 
ACACAGTTGCAGGGCTGGAAAAAGAACTGAGTAATGCCAAAGAGGAACTTGAACTCAT 
GGCT AAA71AAG AAAG AG AATCACAGATGGAACTTTCTGCT CT AC AGTCCATG ATGG CT 
GTGCAGGAAGAAGAGCTGCAGGTGCAGGCTGCTGATATGGAGTCTCTGACCAGGAACA 
TACAGATTAAAGAAGATCTCATAAAGGACCTGCAAATGCAACTGGTTGATCCTGAAGA 
CATACCAGCTATGGAACGCCTG ACCCAGG AAGT CTTACTTCTTCGGGAAAAAGTTG CT 
TCAGTAGAATCCCAGGGTCAAGAAATTTCAGGAAACCGAAGACAACAGCAGTTGCTGC 
TG ATGCTAGAAGG ACTAGTAGATGAACGGAGTCGG CTCAATGAGGCCTTACAAG CAG A 
GAGACAGCTCTATAGCAGTCTGGTGAAGTTCCATGCCCATCCAGAGAGCTCTGAGAGA 
GACCGAACTCTGCAGGTGGAACTGGAAGGGGCTCAGGTGTTACGCAGTCGGCTAGAAG 
AAGTTCTTGG AAG AAGCTTGGAGCG CTTAAACAGG CTGG AGACCCTGGCCGCCATTGG 
AGGTGCAGCTGCAGGGGATGACACCGAAGATACAAGCACTGAGTTCACTGACAGTATT 
GAGGAGGAGGCTGCACACCATAGTCACCAGCAACTTGTCAAGGTGGCTTTGGAGAAAA 
GTCTGGCAACTGTGGAGACCCAGAACCCATCTTTTTCCCCTCCTTCTCCGATGGGAGG 
GG ACAGTAACAGGTGTCT TC AGG AAGAAATGCTC CACCTG AGGGCTG AGATCC ACCAG 
CACTTAGAAGAGAAGAGGAAAGCTGAGGAGGAACTGAAGGAGCTAAAGGCTCAAATTG 
AGG AAGCAGGATTCTCCTCAGTGTCCCACAT CAGG AACACC ATG CTGAGCCTTTGCCT 
TGAGAATGCGGAGCTGAAAGAGCAGATGGGAGAAACAATGTCTGATGGATGGGAGATC 
GAGGAAGACAAGGAGAAGGGCGAGGTGATGGTTGAGACTGTGGTAACCAAAGAGGGTC 
TGAGTGAGAGTAGCCTTCAGGCTGAGTTCAGAAAGCTCCAGGGAAAACTGAAGAATGC 
CCACAATATC ATCAACCTCCT CAAAGAACAACTTGTG CTGAGTAGCAAGGAAGGG AAT 
AGT AAACTT ACTC CAG AG CT C CTTGTG CAT CTG AC CAG CACCAT CGAAAG AAT AAAC A 
CAGAACTGGTTGGTTCCCCTGGGAAGCACCAACACCAAGAGGAGGGGAATGTGACTGT 
GAGG CCTTTCCCCAGACCCCAGAGCCTTGACCTTGGGG CTACCTTCACAGTGG ATGCC 
CACCAACAGTTGGATAACCAGTCCCAGCCTCGTGACCCTGGGCCTCAGCCAGCGTTTA 
GCCTACCAGGGTCCACCCAGCACCTGCGCTCCCAGCTGTCACAATGCAAACAACGCTA 
TCAAGATCTCCAGGAGAAGCTGCTGCTATCAGAAGCCACTGTCTTTGCTCAGGCTAAC 
G AGCTGG AG AAAT ACAGAGTT ATG CTTAG TGAATCCTTGGTG AAGCAGGACAG CAAGC 
AG AT CCAGGTGG ACTTCCAGG ACCTGGGCT ATG AG ACTTGTGGCCGAAG CGAG AATGA 
GGCTGAACGGGAGG AAACCACCAGT CCTGAGTGTGAGG AGC ACAACAGCCTCAAGG AA 
ATGGTCCTGATGGAGGGGCTGTGCTCTGAGCAGGGACGCCGGGGCTCAACACTGGCTA 
GTTC CTCTGAG AGGAAG CC CTTGGAGAACCAGCT AGGG AAGCAGG AAGAGTTCCGGGT 
ATATGGAAAGTCAGAAAACATCTTGGTCCTACGAAAGGACATCGAAGATCTGAAGGCC 
CAGCTGCAGAATGCCAACAAGGTCATTCAAAACCTCAAGAGCCGGGTCCGGTCCCTCT 
CAGTTACAAGTGATTATTCGTCTAGTCTGGAAAGACCCCGGAAGCTGAGAGCTGTTGG 
CACCTTGGAGGGGTCTTCACCTCATAGTGTCCCTGATGAGGATGAGGGGTGGCTGTCT 
GATGGCACTGGGGCTTTCTACTCTCCAGGGCTTCAGGCCAAAAAGGACCTGGAGAGTC 
TCATCCAGAGAGTATCCCAGCTGGAGGCCCAGCTCCCAGAAAATGGACTAGAAGAGAA 
GCTGGCTGAGGAGCTGAGATCAGCCTCGTGGCCTGGGAAATATGATTCCCTGATTCAG 
GATCAGGCCCGGGAACTGTCTTACCTACGGCAAAAAATACGAGAAGGGAGAGGTATTT 
GTTATCTTATCACCCAGCATGCAAAAGATACAGTAAAATCTTTTGAGGATCTCCTAAG 
GAGCAATGACATTGACTACTACCTGGGACAGAGCTTCCGGGAGCAACTCGCCCAGGGA 
AGCC AG CTGACAGAGAGGCTCACC AGCAAACTCAG CACAGAGGATCATAAAAGTG AGA 
AAGATCAAGCTGGACTTGAGCCACTGGCCCTCAGGCTCAGCAGGGAGCTGCAGGAGAA 
GGAGAAAGTGATTGAAGTCCTGCAGGCCAAGCTGGATGCTCGGTCCCTCACACCCTCC 
AGCAGCCGTGCCTTGTCTGACTCCCACCGCTCTCCCAGCAGCACCTCTTTCCTGTCTG 
ATG AGCTGG AAG CCTG CTCTGACATGG ACAT AGTCAG CG AG T ACACACACT ATG AAG A 
GAAGAAAGCTTCTCCCAGTCACTCAGGTAGCAGTGCATCTCAGGGGGCTAAGGCCGAA 
TCCAACAGCAACCCCATCAGCTTGCCAACTCCCCAGAATACCCCCAAGGAGGCCAACC 
AAGCCCATTCAGGCTTTCATTTTCACTCCATACCCAAGCTGGCTAGCCTTCCTCAGGC 
ACCATTGCCCTCAGCTCCATCCAGCTTCCTGCCTTTCAGCCCCACTGGCCCTCCCCTC 
CTTGGCTGCTGTGAGACACCAGAGGTCTCCTTGGCTGAGTCTCAGCAGGAGCTACAGA 
TGCTGCAGAAGCAGTTGGGAGAAAGTAGCACTGTTCCTCCTGCTTCCACAGCTACATT 
GCTGAGCAACGACTTGGAAGCCGACTCTTCCTACTACCTCAACTCTGCCCAGCCTCAC 
TCTCCTCCAAGGGGCACCATAGAACTGGGAAGAATCCTAGAGCCTGGGTACCTGGGCA 
GCAGTGGCAAGTGGGATGTGATGAGGCCTCAGAAAGGGAGTGTATCTGGGGACCTATC 
CTCAGGCTCCTCTGTGTACCAGCTTAACTCCAAACCCACAGGGGCTGACCTGCTGGAA 
GAGCATCTTGGTGAAATCTGGAACCTGCGCCAGCGCCTGGAGGAGTCCATCTGCATCA 
ATGACTGCCTACGGGAGCAACTGGAACACCGGCTGACCTCTACTGCTCGTGGAAGGGG 
ATCCACTtCTAACTTCTACAGTCAGGGCCTGGAGTCCATACCTCAGCTCTGCAATGAG 
AACAGAGTCCTCAGGGAAGAAAATCGAAGACTTCAGGCTCAACTGAGTCATGTTTCCA 
GAGGTCACTCCCAGGAAACAGAAAGCCTGAGGGAGGCTCTGCTGTCCTCTCGATCCCA 
CCTTCAAGAGCTGGAAAAGGAGCTGGAGCACCAGAAGGTGGAAAGGCAGCAGCTTTTG 
GAAGACTTGAGGGAGAAGCAGCAAGAGGTCTTGCATTTCAGGGAGGAACGTCTTTCCC 
TCCAGGAAAACGACTCCAGACTGCAGCACAAGCTGGTTCTCCTGCAGCAACAGTGTGA 
AG AG AAACAGC AGCTCTTTGAGTCCCTCCAGTCAG AG CT ACAAATCT ACGAGG CACTT 
TATGGCAATTCCAAGAAGGGGCTGAAAGCTTACAGCCTGGATGCCTGTCACCAAATCC 
CTTTGAGCAGTGAC CTGAGCCACCTGGTGGCAGAGGTACAAGCTCTG AG AGGG CAGCT 
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GGAGCAGAGCATTCAGGGGAACAATTGTCTGCGACTGCAGCTGCAACAGCAGCTGGAG 
AGCGGTGCTGGCAAAGCCAGCCTCAGCCCCTCCTCCATTAACCAGAACTTCCCAGCCA 
GCACTGACCCTGGAAACAAGCAGCTGCTCCTCCAAGGTTCAGCTGTGTCCCCTCCAGT 
CCGGGATGTTGGTATGAATTCCCCAGCTCTGGTCTTCCCCAGCTCTGCTTCCTCTACT 
CCTGGCTCAGATTCAGTTGTGTTGTCATTTTCTTTTTCAGGCTTGGGTTTGGATACTT 
CTCCAGT AATGAAGACCCCTCC CAAG CTAGAGGGTGATG CTACTG ATGGCTCCTTTGC 
CAATAAGCATGG C CGCCATGTC ATTGG CCACATTGATG ACTACAGTGCCCTAAGACAG 
CAG ATTGCGG AGGG CAAG CTGCTGGTCAAAAAGAT AGTGTCTCTTGTG AG ATC AGCGT 
GCAGCTTCCCTGGCCTTGAAGCCCAAGGCACAGAGGGCAGCAAAGGCATTCATGAGCT 
TCGGAGCAGCACCAGTGCCCTGCACCATGCCCTAGAGGAGTCGGCTrCCCTCCTCACC 
ATGTTCTGGAGAGCGGCCCTGCCAAGCACCCACATCCCTGTGCTGCCTGGCAAACAGG 
GAGAATCAACAGAAAGGGAACTTCTGGAACTGAGAACCAAAGTATCCAAACAGGAGCA 
GCT C CTTCAGAGCACAACTG AGCATCTG AAG AACGCCAACCAG CAGAAGGAG AGCATG 
GAACAGTTCATTGTCAGCGTAACCAGAACACATGATGTTTTAAAGAAGGCAAGGACTA 
ACTTAGAGGTGAAATCCCTAAGGGCTCTGCCGTGTACTCCAGCCTTGTGACCCTTGCC 
TTCCAGGAACCATGCAAGAAGCGCAGCCACCAGAAGTCCTTAAAACAGCAGGAAAGGT 


GAGCCTGTCCCCCTTTTGTGCAGCTACCTATCTGCTGAGGAGCATCTGGGCCTCATTC 


CTCCAAGT 




ORF Start: ATGat 155 


ORF Stop: TGA at 6950 




SEQIDNO: 296 


2265 aa 


MWat255081.5kD 


NOV103C, 

CG59773-03 Protein Sequence 


MRPAGRTSTSGGDVENLNSQNEAELRRQFEERQQETEHVYELLENKIQLLQEESRLAK 
NEAARMAALVEAEKECNLELSEKLKGVTKNWEDVPGDQVKPDQYTETLAQRDKRIEEL 
NQ S LAAQE RLVEQLS REKQQLLHLLE E PTSME VQ PMTEELLKQQ KLN S H ETT I TQQ S V 
SD S HLAELQE KI QQTEATNK I LQEKLNEMS YE LKCAQES SQKQDGT I QNLKETLKS RE 
RET E ELYQ V I EGQNDTMAKLREMLHQSQLGQLQS S EGTS P AQQQ VALLD LQS ALFC SQ 
LEIQKLQRVVRQKERQIJUDAKQCVQFVEAAAHESEQQKEASWKHNQELRKALQQLQEE 
LQNKSQQLRAWEAEKYNEIRTQEQNIQHLNHSLSHKEQLLQEFRELLQYRDNSDKTLE 
AN EMLLEKLRQR I HDKAVALERA I DE KF SALE EKE KE LRQLRLA VRERDHD LERLRDV 
LSSNEATMQSMESLIJiAKGLEVEQLSTTCQNLQWLKEEMETKFSRWQKEQESIIQQLQ 
TS LHDRNKEVEDLS ATLLCKLGPGQS E I AEELCQRLQRKERMLQDLLSDRNKQVLEHE 
MEIQGLliQSVSTREQESQAAAEKLVQALMERNSELQALRQYLGGRDSLMSQAPISNQQ 

KEELELMAKKERESQMELSALQSMMAVQEEELQVQAADMESLTRNIQIKEDLIKDLQM 
QLVDPEDI PAMERLTQE VLLLREKVAS VESQGQE I SGNRRQQQLLLMLEGLVDERSRL 
NE ALQAERQL YS S L VKFHAH P ES S ERDRT LQ VELEG AQVLRS RLE E VLGRS LERLNRL 
ETIiAAIGGAAAGDDTEDTSTEFTDSIEEEAAHHSHQQLVKVALEKSLATVETQNPSFS 
PPSPMGGDSNRCLQEEMLHLRAEIHQHLEEKRKAEEELKELKAQIEEAGFSSVSHIRN 
TMLS LCLENAELKEQMGETMSDGWE I EEDKEKGEVMVETWTKEGLSESS LQAE FRKL 
QGKLKNAHNIINLLKEQLVLSSKEGNSKLTPELLVHLTSTIERINTELVGSPGKHQHQ 
EEGNVTVRPFPRPQSIiDLGATFTVDAHQQLDNQSQPRDPGPQPAFSLPGSTQHLRSQL 
SQCKQR YQDLQE KLLLS EAT VF AQANELEKYRVMLS ESLVKQDS KQ I QVD FQDLG YET 
CGRSENEAEREETTSPECEEHNSLKEMVLMEGLCSEQGRRGSTLASSSERKPLENQLG 
KQEEFRVYGKSENILVLRKD I EDLKAQLQNANKVI QNLKSRVRSLS VTSDYS SS LERP 
RKLRAVGTLEGSSPHSVPDEDEGWLSDGTGAFYSPGLQAKKDLESLIQRVSQLEAQLP 
ENGLEE KLAE E LRS AS W PGKYD SLI QDQARELS YLRQK I REGRG I CYLI TQHAKDTVK 
SFEDLtiRSNDIDYYLGQSFREQLAQGSQLTERLTSKLSTEDHKSEKDQAGLEPLALRL 
SRELQEKEKVIEVLQAKLDARSLTPSSSRALSDSHRSPSSTSFLSDELEACSDMDIVS 
E YTHYEEKKAS P SH SGS S AS QG AKAE SNSN P I S LPT PQNT PKEANQ AHSG FH FH S I PK 
LASLPQAPLPSAPSSFLPFSPTGPPLLGCCETPEVSLAESQQELQMLQKQLGESSTVP 
PASTATLLSNDLEADSSYYLNSAQPHSPPRGTIELGRILEPGYLGSSGKWDVMRPQKG 
SVSGDLSSGSSVYQLNSKPTGADLLEEHLGEIWNLRQRLEESICINDCLREQLEHRLT 
STARGRGS TSN F YSQGLES I PQLCNENR VLRE ENRRLQAQLS HVS RGH SQETES LREA 
LLSSRSHLQELEKELEHQKVERQQLLEDLREKQQEVLHFREERLSLQENDSRLQHKLV 
LLQQQCEEKQQLFESLQSELQI YEAL YGNS KKGLKAYS LDACHQ I PLSSDLSHLVAEV 
QALRGQLEQS IQGNNCLRLQLQQQLESGAGKASLS PSS INQNFPASTDPGNKQLLLQG 
SAVS PPVRDVGMNSPALVFPSSASSTPGSDSWLSFSFSGLGLDTS PVMKTPPKLEGD 
ATDGSFANKHGRHVIGHIDDYSALRQQIAEGKLLVKKIVSLVRSACSFPGLEAQGTEG 
SKGIHELRSSTSALHHALEESASLLTMFWRAALPSTHIPVLPGKQGESTERELLELRT 
KVSKQEQLIiQSTTEHLKNANQQKESMEQFIVSVTRTHDVLKKARTNLEVKSLRALPCT 
PAL 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 1 03B. 



Table 103B. Comparison of NOV103a against NOV103b through NOV103c. 


Protein Sequence 


NOV103a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 
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NOV103b 


365..2196 
202..2016 


1510/1834 (82%) 
1518/1834 (82%) 


NOV103c 


365..2196 
140.. 1954 


1510/1834 (82%) 
1518/1834 (82%) 



Further analysis of the NOV 103a protein yielded the following properties shown in 
Table 103C. 



Table 103C Protein Sequence Properties NOV103a 


PSort 
analysis: 


0.5855 probability located in mitochondrial matrix space; 0.4200 probability 
located in nucleus; 0.3000 probability located in microbody (peroxisome); 0.2957 
probability located in mitochondrial inner membrane 


SignalP 
analysis: 


Likely cleavage site between residues 39 and 40 



A search of the NOV103a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 103D. 



Table 103D. Geneseq Results for NOV 103a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV103a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAY71159 


Human phosphodiesterase 
interacting protein, myomegalin - 
Homo sapiens, 2517 aa. 
[WO200027861-A1, 18-MAY- 
2000] 


1.2196 
1..2204 


2193/2204 (99%) 
2193/2204 (99%) 


0.0 


AAM40183 


Human polypeptide SEQ ID NO 
3328 - Homo sapiens, 1883 aa. 
[WO200153312-A1, 26-JUL-2001] 


635..2196 
1..1570 


1557/1570 (99%) 
1559/1570(99%) 


0.0 


AAY71158 


Rat phosphodiesterase interacting 
protein, myomegalin - Rattus sp, 
2326 aa. [WO200027861-A1, 18- 
MAY-2000] 


365..2197 
202..2017 


1433/1837 (78%) 
1572/1837 (85%) 


0.0 


AAY67600 


Human adipose tissue protein #3 - 
Homo sapiens, 944 aa. 
[JP2000037190-A, 08-FEB-2000] 


1..934 j 
1..934 I 


925/934 (99%) 
927/934 (99%) 


0.0 


AAU01768 








0.0 
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sapiens, 934 aa. [WO200123546- 


197..934 


733/738 (98%) 






A1,05-APR-2001] 









In a BLAST search of public sequence databases, the NOV103a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 103E. 



Table 103E. Public BLASTP Results for NOV103a 


Protein 
Accession 
Number 


Protein/Organism/Length 


JNOViUia 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


075042 


KIAA0454 PROTEIN - Homo 
sapiens (Human), 1882 aa 
(fragment). 


636..2196 
1..1569 


1558/1569 (99%) 
1558/1569(99%) 


0.0 




1V1 X wlViJCVJ/VLvllN - XVdllUa 

norvegicus (Rat), 2324 aa. 


91 07 
OUJ..Z i y 1 

202..2015 


lOJO \IO/0) 

1581/1838(85%) i 


U.VJ 


075065 


KIAA0477 PROTEIN - Homo 
sapiens (Human), 1132 aa. 


1..1132 
1..1132 


1132/1132(100%) 
1132/1132(100%) 


0.0 


Q25893 


LIVER STAGE ANTIGEN - 
Plasmodium falciparum (isolate 
NF54), 1909 aa. 


356..1459 1 
605.. 1651 


243/1129(21%) 
488/1129(42%) : 


4e-35 


Q13439 


Golgi autoantigen, golgin subfamily 
A 4 (Trans-Golgi p230) (256 kDa 
golgin) (Golgin-245) (72.1 protein) - 
Homo sapiens (Human), 2230 aa. 


229.. 1749 
267..1814 


349/1638(21%) 
679/1638 (41%) 


4e-34 



PFam analysis predicts that the NOV 103 a protein contains the domains shown in the 
Table 103F. 
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Table 103F. Domain Analysis of NOV103a 


Pfam Domain 


NO VI 03a Match 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


Somatomedin B* domain 1 of 

1 


150.. 189 


14/47 (30%) 
25/47 (53%) 


7.6 


recA* domain 1 of 1 


621 650 


8/30 (27%) 
22/30 (73%) 


8.1 


Ribosomal L10: domain 1 of i 
1 


604..695 


20/109 (18%) 
59/109(54%) 


9.9 


Dishevelled: domain 1 of 1 


844..914 


19/74 (26%) 
37/74 (50%) 


2.7 


Transposase 22: domain 1 of 
1 


1 135..1416 


71/376(19%) 
127/376(34%) 


4.6 


Phe tRNA-synt N: domain 1 
ofl 


2079..2152 


13/79(16%) 
49/79 (62%) 


4.9 



Example 104. 

The NOV 104 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 104A. 



Table 104A. NOV104 Sequence Analysis 




SEQ ID NO: 297 


736 bp 


NOV104a, 

CG57460-01 DNA Sequence 


AAAGCACCCGAGATGACCCCGGCTCCTCCACCAGGAGCGCGGCCGGGCGCGGCGTCCC 
TAGCGGGCTTCGCCGGGGTGGCGTCTCTGGGGCCTGGGGACCCCCGCCGCGCCGCTGA 
CCCGCGCCCTCTGCCCCCAGCGCTGTGCTTCGCCGTGAGCCGCTCGCTGCTGCTGACG 
TGCCTGGTGCCGGCCGCGCTGCTGGGCCTGCGCTACTACTACAGCCGCAAGGTGATCC 
GCGCCTACCTGGAGTGCGCGCTGCACACGGACATGGCGGACATCGAGCAGTACTACAT 
GAAGCCGCCCGGTGTGTCCCTGACCGCCCTATCCCCTGCAGGCTCCTGCTTCTGGGTG 
GCCGTG CTGGATGG C AACGTGGTGGGCATTGTGGCTGCACGGGCCCACG AGG AGG ACA 
ACACGGTGGAGCTGCTGCGGATGTCTGTGGACTCACGTTTCCGAGGCAAGGGCATCGC 
CAAGGCGCTGGGCCGGAAGGTGCTGGAGTTCGCCGTGGTGCACAACTACTCCGCGGTG 
GTGCTGGGCACGACGGCCGTCAAGGTGGCCGCCCACAAGCTCTACGAGTCGCTGGGCT 
TCAGACACATGGGCGCCAGTGACCACTACGTGCTGCCGGGCATGACCCTCTCGCTGGC 
TGAGCGCCTCTTCTTCCAGGTCCGCTACCACCGCTACCGCCTGCAGCTGCGCGAGGAG 
TGACCGCCGCCGCTCGCCCGCCCGCCCCCCCGGCCGCCCT 




ORF Start: ATGat 13 


ORF Stop: TGA at 697 




SEQ ID NO: 298 


228 aa 


MW at 24767.5kD 


NOV 104a, 

CG57460-01 Protein Sequence 


MTPAPPPGARPGAASLAGFAGVASLGPGDPRRAADPRPLPPALCFAVSRSLLLTCLVP 
AALLGLRYYYSRKVIRAYLBCAUiTDMADIEQYYMKPPGVSLTALSPAGSCFWVAVLD 
GNWGIVAARAHEEDNTVELLRMSVDSRFRGKGIAKALGRKV^^ 
TAVKVAAHKLYESLGFRHMGASDHYVLPGMTLSLAERLFFQVRYHRYRLQLREE 
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Further analysis of the NOV 104a protein yielded the following properties shown in 
Table 104B. 



Table 104B. Protein Sequence Properties N6V104a 


PSort 
analysis: 


0.6400 probability located in plasma membrane; 0.4600 probability located in 
Golgi body; 0.3700 probability located in endoplasmic reticulum (membrane); 
0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Likely cleavage site between residues 64 and 65 



A search of the NOV 104a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 104C. 



Table 104C. Geneseq Results for NOV104a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV104a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB 19986 


Human camel lo 3 (Hcml3) protein 
(partial) - Homo sapiens,. 144 aa. 
[WO200077024-A1, 21-DEC-2000] 


42.. 195 
1..144 


144/154(93%) 
144/154(93%) 


7e-76 


AAB 19985 


Human camello 2 (Hcml2) protein - 
Homo sapiens, 227 aa. 
[WO200077024-A1, 21-DEC-2000] 


47..200 
56..203 


63/158(39%) 
92/158 (57%) 


le-21 


AAB 19984 


Human camello 1 (Hcmll) protein - 
Homo sapiens, 227 aa. 
[WO200077024-A1, 21-DEC-2000] 


41..196 
50.. 199 


60/160 (37%) 
88/160 (54%) 


7e-20 


AAY57959 


Human TSC501 protein SEQ ID 
NO:l - Homo sapiens, 227 aa. 
[JP1 1332579-A, 07-DEC-1999] 


41..196 
50.. 199 


59/160 (36%) 
87/160 (53%) 


4e-19 


AAB 19987 


Mouse camello 1 (Mcmll) protein - 
Mus sp, 222 aa. [WO200077024-A1, 
21-DEC-2000] 


41..194 
50.. 197 


63/158 (39%) 
87/158 (54%) 


le-18 



In a BLAST search of public sequence databases, the NOV 104a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 104D. 



Table 104D. Public BLASTP Results for NO VI 04a 




Protein/Organism/Len gth 
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Accession 
Number 




Residues/ 
Match 
Residues 


Similarities for 
the Matched 
Portion 


Value 


Q9UHF3 


PUTATIVE N- 
ACETYLTRANSFERASE 
CAMELLO 2 - Homo sapiens 
(Human), 227 aa. 


47..200 
56..203 


63/158 (39%) 
92/158 (57%) 


5e-21 


Q9UHE5 


PUTATIVE N- 

ACETYLTRANSFERASE CML1 - 
Homo sapiens (Human), 227 aa. 


41. .196 
50.. 199 


60/160(37%) 
88/160(54%) 


3e-19 


Q9UQ17 


GLA PROTEIN - Homo sapiens 
(Human), 227 aa. 


41. .196 
50.. 199 


60/160(37%) 
88/160 (54%) 


3e-19 


Q96QI8 


KJDNEY-AND LIVER- SPECIFIC 
GENE - Homo sapiens (Human), 
227 aa. 


41. .196 
50.. 199 


59/160(36%) 
87/160(53%) 


le-18 


075839 


TSC501 PROTEIN - Homo sapiens 
(Human), 227 aa. 


41. .196 
50.. 199 


59/160(36%) 
87/160 (53%) 


le-18 



PFam analysis predicts that the NOV 104a protein contains the domains shown in the 
Table 104E. 



Table 104E. Domain Analysis of NOV 1 04a 


Pfam Domain 


NOV104a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


Acetyltransf: domain 1 of 
1 


11L.191 


28/82 (34%) 
64/82 (78%) 


2,2c- 17 



Example 105. 



The NOV 105 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 105 A. 



Table 105A. NOV105 Sequence Analysis 




SEQ ID NO: 299 


1230 bp 


NOV105a, 

CG57464-01 DNA Sequence 


CTTCCCGGCGGCTGCGCGATGGACAGCCCCGAGGTGACCTTCACTCTCGCCTATCTGG 
TGTTCGCCGTGTGCTTCGTGTTCACGCCCAACGAGTTCCACGCGGCGGGGCTCACGGT 
GCAGAACCTGCTGTCGGGCTGGCTGGGCAGCGAGGACGCCGCCTTCGTGCCCTTCCAC 
TTGCGCCGCACGGCCGCCACGCTGTTGTGCCACTCGCTGCTGCCGCTCGGTGAGGCTG 
CTCGGGCCGGCCGGCCGCATCCTCTCCTGCGCAGGGCTTGCTGGGAGGTCAGGAGGAG 
GCCTCCGCCAGCTCCCCGAGGCCCCGAAAGCGCCTGGGCGCAGCTGGGGAGAGGCGCC 
GGTCCTCATC CAGAGGGAC CGCGGCGTGGGCTGAGCG CGCTTAGGGGTGCCGCCGGCC 
TGGCCTGGCGGCTCTTCCTGCTGCTGGCCGTGACCCTCCCCTCCATCGCCTGCATCCT 
GATCTACTACTGGTCCCGTGACCGGTGGGCCTGCCACCCACTGGCGCGCACCCTGGCC 
CTCTACGCCCTCCCACAGTCTGGCTGGCAGGCTGTTGCCTCCTCTGTCAACACTGAGT 
TCCGGCGGATTGACAAGTTTGCCACCGGTGCACCAGGTGCCCGTGTGATTGTGACAGA 
CACGTGGGTGATGAAGGTAACCACCTACCGAGTGCACGTGGCCCAGCAGCAGGACGTG 
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CACCTGACTGTGACGGAGTCTCGGCAGCATGAGCTCTCGCCAGACTCGAACTTGCCCG 
TGCAGCTCCTCACCATCCGTGTGGCCAGCACCAACCCTGCTGTGCAGGCCTTTGACAT 
CAGGCTGAACTCCACTGAGTACGGGGAGCTCTGCGAGAAGCTCCGGGCACCCATCCGC 
AGGGCAGCCCATGTGGTCATCCACCAGAGCCTGGGCGACCTGTTCCTGGAGACATTTG 
CCTCCCTGGT AGAGGTCAAC CCGGCCTACTCAGTGC CCAGCAGCCAGGTGGGGGGCCT 
GGAGGCCTGCATAGGCTGCATGCAGACACGTGCCAGCGTGAAGCTGGTGAAGACCTGC 
CT^GGAGGCAGCCACAGGCGAGTGCCAGCAGTGTTACTGCCGCCCCATGTGGTGCCTCA 
CCTGCATGGGCAAGTGGTTCGCCAGCCGCCAGGACCCCCTGCGCCCTGACACCTGGCT 
GGCCAGCCGCGTGCCCTGCCCCACCTGCCGCGCACGCTTCTGCATCCTGGATGTGTGC 
ACCGTGCGCTGA 




ORF Start: ATG at 19|ORF Stop: TGA at 1228 




SEQ ID NO: 300 |403 aa 


MW at 44585.0kD 


NOV 105a, 

CG5 7464-01 Protein Sequence 


MDS PEVTFTLAYLVFAVCFVFTPNE FHAAGLTVQNLLSGWLGSEDAAFVPFHLRRTAA 
TLLCHSLLPLGEAARAGRPHPLLRRACWEVRRRPP PAPRGPE SAWAQLGRGAGPHPEG 
PRRGLSALRGAAGLAWRLFLLLAVTLPS IACI LI YYWSRDRWACHPLARTLALYALPQ 
SGWQAVAS S VNTE FRR I DKF ATG A PGARVI VTDTWVMKVTT YRVHVAQQQD VHLTVTE 
SRQHELS PDSNLPVQLLTIRVASTNPAVQAFD I RLNSTE YGE LCEKLRAPI RRAAHW 
I HQS LGDL FLETFAS L VE VN PAYS VPS S QVGG LE AC I GCMQTRAS VKL VKTCQEAATG 
ECQQCYCRPMWCLTCMGK^/FASRQDPI^PDTWLASRVPCPTCRARFCII^VCTVR 



Further analysis of the NOV105a protein yielded the following properties shown in 
Table 105B. 



Table 105B. Protein Sequence Properties NOVlOSa 


PSort 
analysis: 


0.6760 probability located in plasma membrane; 0.1000 probability located in 
endoplasmic reticulum (membrane); 0.1000 probability located in endoplasmic 
reticulum (lumen); 0.1000 probability located in outside 


SignalP 
analysis: 


Likely cleavage site between residues 29 and 30 



A search of the NOVlOSa protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 105C. 



Table 105C. Geneseq Results for NOVlOSa 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOVlOSa 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
- Region 


Expect 
Value 


AAG81377 


Human AFP protein sequence SEQ 
ID NO:272 - Homo sapiens, 362 aa. 
[WO200129221-A2, 26-APR-2001] 


1..403 
1..362 


344/409 (84%) 
345/409 (84%) 


0.0 



In a BLAST search of public sequence databases, the NOV 105a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 105D. 
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Table 105D. Public BLASTP Results for NOV 105a 


Protein 
Accession 
Number 


PrA^oin /nrnon Jem /T An nth 

* roicin/urgdn lain/ i^cugin 


NOV105a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


CAC38627 


SEQUENCE 271 FROM PATENT 
WO01 29221 - Homo sapiens 
(Human), 362 aa. 


1..403 
1..362 


345/409 (84%) 
346/409(84%) ; 


0.0 


Q9DCF3 


06 1 0039G24RK PROTEIN - Mus 
musculus (Mouse), 362 aa. 


1..403 
1..362 


311/403(77%) \ 
328/403 (81%) | 


e-176 


Q96GP5 


SIMILAR TO RIKEN CDNA 
0610039G24 GENE - Homo 

saniens fHumarVl 232 aa 


1..265 
1..226 


211/271 (77%) \ 
212/271 (77%) i 


e-109 


Q9VN16 


CGI 4646 PROTEIN - Drosophila 
melanogaster (Fruit fly), 409 aa. 


1..399 
1..383 


123/409(30%) ! 
202/409 (49%) 


le-55 


Q95TM4 


LD3981 IP -Drosophila 
melanogaster (Fruit fly), 393 aa. 


20..399 
4..367 


117/390(30%) 
192/390(49%) ; 


le-51 



PFam analysis predicts that the NOV 105a protein contains the domains shown in the 
Table 105E. 



Table 105E. Domain Analysis of NOVlOSa 



Pfam Domain 



NOVlOSa Match Region 



Identities/ 
Similarities 
for the Matched Region 



Expect Value 



No Significant Matches Found 



Example 106. 

The NOV106 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 106A. 



Table 1G6A. NOV106 Sequence Analysis 




SEQ ID NO: 301 


1136 bp 


NOV106a, 

CG57466-01 DNA Sequence 


TTTCTGCAATGGG AG CCTCCGCCACCCAC C CTGGAGCCACAG AAGG CCCAGAAGCCAA 
ATGGACAG CTGGTGAACCCCJ^CAACTTCTGGAAGAACCCGAAAG ATGTGTGCGCC CA 
CGCCCATGGCCTCTCAGGGCCCAGGCCTGGGACGTGACCACCACTAACTGCTCAGCCA 
ATATCAACTTGACCCACCAGCCCTGGTTCCAGGTCCTGGAGCCGCAGTTCCGGCAGTT 
TCTCTTCTACCGCCACTGCCGCTACTTCCCCATGCTGCTGAACCACCCGGAGAAGTGC 
AGGGGCGATGTCTACOTSCTGGTCGTTGTCAAGTCXKSTC^TCACGCAGavCGACCXSCC 
GCGAGGCCATCCGCCAGACCTGGGCGCGAGCGGCAGTCCGCGGGTGGGGGCCGAGCGC 
CGTGCGCACCCTCTTCCTGCTGGGCACGGCCTCCAAGCAGGAGGAGCGCACGCACTAC 
CAGCAGCTGCTGGCCTACGAAGACGCCCTCTACGGCGACATCCTGCAGTGGGGCTTTC 
TCGACJVCCTTCTTCAACCTGACCCTCAAGGAGATCCACTTCCTCJ^GTXjGCTGGACAT 
CTACnKaCCCCC^CGTCCCCTTCATTTTCAAAGGCGACGATGACGTCTTCGTCAACCCC 
ACCAACCTG CTAGAATTTCTGGCTGACCGGCAGCC^CAGG AAAACCTCTO CX5TCX3GCG 
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ATGTCCTGCAGCACGCTCGGCCCATTCGCAGGAAAGACAACAAATACTACATCCCGGG 
GGCCCTGTACGGCAAGGCCAGCTATCCGCCGTATGCAGGCGGCGGTGGCTTCCTCATG 
GCCGGCAGCCTGGCCCGGCGCCTGCACCATGCCTGCGACACCCTGGAGCTCTACCCGA 
TCGACGACGTCTTTCTGGGCATGTGCCTGGAGGTGCTGGGCGTGCAGCCCACGGCCCA 
CGAGGGCTTCAAGACTTTCGGCATCTCCCGGAACCGCAACAGCCGCATGAACAAGGAG 
CCGTGCTTTTTCCGCGCCATGCTCGTGGTGCACAAGCTGCTGCCCCCTGAGCTGCTCG 
CCATGTGGGGGCTGGTGCACAGCAATCTCACCTGCTCCCGCAAGCTCCAGGTGCTCTG 
AC CCC AG CCGGG CTACTAGGACAGGCCAGGGCAC 




ORF Start: ATG at 9 


ORF Stop: TGAat 1101 




SEQ ID NO: 302 


364 aa MW at 41853.8kD 


NOV106a, 

CG57466-01 Protein Sequence 


MGASATHPGATEGPEAKWTAGEPQQLLEEPERCVRPRPWPLRAQAWDVTTTNCSANIN 
LTHQPWFQVLEPQFRQFLPYRHCRYFPMLLNHPEKCRGDVYLLVWKSVITQHDRREA 
IRQTWARAAVRGWGPSAVRTLFLLGTASKQEERTHYQQLLAYEDALYGDILQWGFLDT 
FFNLTLKE IHFLKWLD I YCPHVPFI FKGDDDVFVNPTNLLEFLADRQPQENLFVGDVL 
QHARP I RRKDNKYYI PGALYGKAS YP PYAGGGGFLMAGS LARRLHHACDTLELYP I DD 
VFLGMCLEVLGVQPTAHEGFKTFGISRNRNSRMNKEPCFFRAMLVVHKLLPPELLAMW 
GLVHSNLTCS RKLQVL 



Further analysis of the NOV 106a protein yielded the following properties shown in 
Table 106B. 



Table 106B. Protein Sequence Properties NOV106a 


PSort 
analysis: 


0.6400 probability located in microbody (peroxisome); 0.4500 probability 
located in cytoplasm; 0.3122 probability located in lysosome (lumen); 0.1000 
probability located in mitochondrial matrix space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV 106a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 106C. 



Table 106C. Geneseq Results for NO VI 06a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV106a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB24035 i 


Human PR04397 protein sequence 
SEQ ID NO:42 - Homo sapiens, 402 
aa. [WO200053750-A1, 14-SEP- 
2000] 


72.352 
84..380 


149/300(49%) I 
191/300 (63%) 


4e-76 


AAU29167 j 


Human PRO polypeptide sequence 
#144 - Homo sapiens, 372 aa. 
[WO200168848-A2, 20-SEP-2001] 


26..363 
27..371 


149/348 (42%) 
207/348 (58%) 


9e-76 


AAB88404 


Human membrane or secretory 
protein clone PSEC0159 - Homo 


26.. 363 
27..371 


149/348(42%) j 
207/348 (58%) 


9e-76 



412 



WO 02/072757 



PCT/US02/06908 





JAN-2001] 








AAB49750 


Human beta 1,3-N-acetylglucosamine 
transferase protein G4 - Homo 
sapiens, 372 aa. [WO200100848-A1, 
04-JAN-2001] 


26..363 
27..371 


149/348(42%) 
207/348 (58%) 


9e-76 


AAB49749 ; 


Human beta 1,3-N-acetylglucosamine 
transferase protein G4 - Homo 
sapiens, 372 aa. [WO200100848-A1, 
04-JAN-2001] 


26..363 
27..371 


149/348(42%) 1 
207/348(58%) 1 


9e-76 



In a BLAST search of public sequence databases, the NOV 106a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 106D. 



Table 106D. Public BLASTP Results for NOV 106a 


Protein 
Accession 
Number 


Protein/Organ ism/Length 


NOV106a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities 

for the 
Matched | 

Portion 


Expect 
Value 


AAL32295 


BETA-3-GALACTOSYLTRANSFERASE - 
Brachydanio rerio (Zebrafish) (Zebra danio), 
418 aa. 


46..364 
101..417 


199/319 
(62%) 

249/319 
(77%) 


e-121 


AAL32297 : 


BETA-3 -GALACTOS YLTRANSFERASE - 
Brachydanio rerio (Zebrafish) (Zebra danio), 
412 aa. 


29..360 
82..409 


180/337 
(53%) 

244/337 
(71%) 


e-104 


Q96EK0 


UNKNOWN (PROTEIN FOR MGC:20513) - 
Homo sapiens (Human), 377 aa. 


60..352 
46..355 


152/313 
(48%) 

198/313 
(62%) 


9e-76 


CAC39768 


SEQUENCE 175 FROM PATENT 
EP1067182 - Homo sapiens (Human), 372 aa. 


26..363 
27..371 


149/348 
(42%) 

207/348 
(58%) 


3e-75 


Q9C0J2 


BETA-1.3-N- 

ACETYLGLUCOSAMINYLTRANSFERASE 
BGNT-3 - Homo sapiens (Human), 372 aa. 


26.J63 
27..371 


149/348 
(42%) 

207/348 
(58%) 


3e-75 
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PFam analysis predicts that the NOV 106a protein contains the domains shown in the 
Table 106E. ^ 



Table 106E. Domain Analysis of NO VI 06a 


Pfam Domain 


NO VI 06a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


PI3 PI4 kinase: domain 1 
of 1 


195..205 


8/12(67%) , 
10/12 (83%) 


8.5 


Galactosyl_T: domain 1 of 1 


112.308 


69/212 (33%) 
148/212 (70%) 


7.7e-45 



Example 107. 

The NOV 107 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 107 A. 



Table 107A. NOV107 Sequence Analysis 




SEQ ED NO: 303 


4091 bp 


NOV107a, 

CG57468-01 DNA Sequence 


AAG C AAG AGG CTGAG ATGG AT CTTG AG G CGG C AAAG AACGG AACAG CCTGGCGCCCCA 
CGAGCGCGGAGGGCGACTTTGAACTGGGCATCAGCAGCAAACAAAAAAGGAAAAAAAC 
GAAGACAGTGAAAATGATTGGAGTATTAACATTGTTTCGATACTCCGATTGGCAGGAT 
AAATTGTTTATGTCG CTGGGTAC CATCATGGCCATAGCTCACGG ATCAGGTCTCCCCC 
T CATGATG ATAGTATTTGGAGAG ATGACTGACAAATTTGTTG ATACTG CAGG AAACTT 
CTCCTTTCCAGTGAACTTTTCCTTGTCGCTGCTAAATCCAGGCAAAATTCTGGAAGAA 
GAAATG ACTAGATATG CATATTACT ACTCAGGATTGGGTGCTGGAGTT CTTGTTGCTG 
CCTATATACAAGTTTCATTTTGGACTTTGGCAGCTGGTCGACAGATCAGGAAAATTAG 
GCAGAAGTTTTTTCATGCTATTCTACGACAGGAAATAGGATGGTTTGACATCAATGAC 
ACCACTGAACTC AAT ACG CGG CTAACAGATGACATCTCCAAAATCAGTG AAGG AATTG 
GTGACAAGG TTGG AATGT T CT TT C AAG CAGT AG C CACG TTTTTTG CAGG ATT CAT AG T 
GGGATTCATCAGAGGATGGAAGCTCACCCTTGTGATAATGGCCATCAGCCCTATTCTA 
GGACTCTCTGCAGCCGTTTGGGCAAAGATACTCTCGGCATTTAGTGACAAAGAACTAG 
CTGCTTATGCAAAAGCAGGCGCCGTGGCAGAAGAGGCTCTGGGGGCCATCAGGACTGT 
GATAGCTTTCGGGGGCCAGAACAAAGAGCTGGAAAGGTATCAGAAACATTTAGAAAAT 
G CCAAAG AG ATTGG AATTAAAAAAGCT ATTTCAGCAAACATTT CC ATGGGTATTG CCT 
TCCTGTTAATATATGCATCATATGCACTGGCCTTCTGGTATGGATCCACTCTAGTCAT 
ATCAAAAGAATATACTATTGGAAATGCAATGACAGTTTTTTTTTCAATCCTAATTGGA 
G CTATGGCCATCGG AG AAACGCT CGTTTTGGCTCCTG AATATTCCAAAGCCAAATCGG 
GGGCTGCGCATCTGTTTGCCTTGTTGGAAAAGAAACCAAATATAGACAGCCGCAGTCA 
AGAAGGGAAAAAGCCAGTAAGCGACACATGTGAAGGGAATTTAGAGTTTCGAGAAGTC 
TCTTTCTTCTATCCATGTCGCCCAGATGTTTTCATCCTCCGTGGCTTATCCCTCAGTA 
TTG AGCG AGG AAAG ACAGT AG C ATTTGTGGGGAGC AGCGG CTGTGGGAAAAGCACTTC 
TGTTCAACTT CTG CAG AG ACTTTATGACC CCGTG CAAGGAC AAGTGGATGGTGTGGAT 
GCAAAAGAArrGAATGTACAGTGGCTCCGTTCCCAAATAGCAATCGTTCCTCAAGAGC 
CTGTGCTCTTCAACTGCAGCATTGCTGAGAACATCGCCTATGGTGACAACAGCCGTGT 
GGTGCCATT AGATGAG ATCAAAG AAGC CG CAAATGC AGCAAATATCCATTCTTTT ATT 
GAAGGTCTCCCTGAGAAATACAACACACAAGTTGGACTGAAAGGAGCACAGCTTTCTG 
GCGGCCAGAAACAAAGACTAGCTATTGCAAGGGCTCTTCTCCAAAAACCCAAAATTTT 
ATTGTTGGATGAGGCCACTTCAGCCCTCGATAATGACAGTGAGTGGCAGGTGGTTCAG 
CATGCCCTTGATAAAGCCAGGACGGGAAGGACATGCCTAGTGGTCACTCACAGGCTCT 
CTGCAATTCAGAACGCAGATTTGATAGTGGTTCTGCACAATGGAAAGATAAAGGAACA 
AGGAACTCATCAAGAGCTCCTGAGAAATCGAGACATATATTTTAAGTTAGTGAATGCA 
CAGTCAGCGAGCAAAGGTCGGACTACAATCGTGGTAGCACACCGACTTTCTACTATTC 
GAAGTGCAGATTTGATTGTGACCCTAAAGGATGGAATGCTGGCGGAGAAAGGAGCACA 
TG CTG AACT AATGG CAAAA C G AGGTCT AT ATT AT TC ACTTG TG ATGT CAC AGGTAATG 
CTTATGGGGACTCTTTCAGACTGTGGTAATAGTCTTCCTGAAGTCTCTCTATTAAAAA 
TTTTAAAGTTAAACAAGCCTGAATGGCCTTTTGTGGTTCTGGGGAC^TTGGCTTCTGT 
TCTAAATGGAACTGTT CATC C AGTATTTTCCATCATCTTTGCAAAAATTAT AACCGTA 
ATGTTTG^AAATAATGATCTTTTGTTTTTCCTCAAAATTTTTTTATATTCATTCCTTT 
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TGTTTTTCCTCAAACMGGTTTCAGCGTAGATTTTTGTTTGTTTGCTTTTCAGGGATT 
ATTTTACGGCAGAGCAGGGGAAATTTTAACGATGAGATTAAGACACTTGGCCTTCAAA 
GCCATGTTATATCAGGATATTGCCTGGTTTGATGAAAAGGAAAACAGCACAGGAGGCT 
TG ACAACAAT ATT AGCCAT AG AT ATAGCACAAATT CAAGGAG C AAC AGGTTCCAGG AT 
TGGCGTCTTAACACAAAATGCAACTAACATGGGACTTTCAGTTATCATTTCCTTTATA 
TATGGATGGGAGATGACATTCCTGATTCTGAGTATTGCTCCT^GTACTTGCCGTGACAG 
GAATGATTGAAACCGCAGCAATGACTGGATTTGCCAACAAAGATAAGCAAGAACTTAA 
GCATGCTGGAAAGGTAAAGATAGCAACTGAAGCTTTGGAGAATATACGTACTATAGTG 
TCATTAACAAGGGAAAAAGCCTTCGAGCAAATGTATGAAGAGATGCTTCAGACTCAAC 
ACAGGAGAAATACCTCGAAGAAAGCACAGATTATTGGAAGCTGTTATGCATTCAGCCA 
TGCCTTTATATATTTTGCCTATGCGGCAGGGTTTCGATTTGGAGCCTATTTAATTCAA 
GCTGGACGAATGTCAAATGCTTTAtCTTTTGATAGAGTTTTTACTGCAATTGCATATG 
GAGCTATGGCCATCGG AG AAACG CTCGTTTTGGCTC CTG AAT ATTCCAAAGCCAAATC 
GGGGGCTGCGCATCTGTTTGCCTTGTTGGAAAAGAAACCAAATATAGACAGCCGO^GT 
C AAGAAGGGAAAAAG C C A CTTTC ACAG G ACACATG TG AAGGG AATTT AGAG TTTCG AG 
AAGTCTCTTTCTTCTATCCATGTCGCCCAGATGTTTTCATCCTCCGTGGCTTATCCCT 
CAGTATTGAGCGAGGAAAGACAGTAGCATTTGTGGGGAGCAGCGGCTGTGGGAAAAGC 
ACTTCTGTTCAACTT CTGCAG AGACTTT ATGACC C CGTGCAAGGACAACAGCTGTTTG 
ATGGTGTGGATGCAAAAGAATTG AATGT ACAGTGGCTCCGTTC C CAAATAGCAATCGT 
TCCTCAAGAGCCTGTGCTCTTCAACTGCAGCATTGCTGAGAACATCGCCTATGGTGAC 
AACAGCCX5TGTGGTGCCATTAGATGAGATCAAAGAAGCCGCAAATGCAGCAAATATCC 
A1TCTTTTATTGAAGGTCTCCCTAAATACAACACACAAGTTGGACTGAAAGGAGCACA 
G CTTTCTGGCGG CCAGAAACAAAG ACT AG CTATTGCAAGGGCTCTTCTCCAAAAACCC 
AAAATTTTATTGTTGGATGAGGCCACTTCAGCCCTCGATAATGACAGTGAGAAGGTAC 
AGGTGGTTCAGCATGCCCTTGATAAAGCCAGGACGGGAAGGACATGCCTAGTGGTCAC 
TCAC AGGCTCTCTG CAATT C AGAACGCAG ATTTGAT AGTGGTTCTGCACAATGGAAAG 
ATAAAGGAACAAGGAACTCATCAAGAGCTCCTGAGAAATCGAGACATATATTTTAAGT 
TAGTGAATGCAC AGTCAG CGAGC AAAGGTCGGACTACAAT CGTGGT AGC ACACCGACT 
TTCT ACT ATTCGAAGTG CAGATTTG ATTGTGACCCT AAAGGATGG AATGCTGG CGG AG 
AAAGGAGCACATGCTGAACTAATGGCAAAACGAGGTCTATATTATTCACTTGTGATGT 
CACAGGTAATGCTT ATGTGACAT AATG CTAT 




ORF Start: ATGat 16 


ORF Stop: TGA at 4078 




SEQIDNO: 304 


1354 aa 


MWat 149167.3UD 


NOV107a, 

CG57468-01 Protein Sequence 


MDLEAAKNGTAWRPTSAEGDFELGISSKQKRKKTKTVKMIGVLTLFRYSDWQDKLFMS 
LGTIMAIAHGSGLPLMMIVFGEMTDKFVDTAGNFSFPVNFSLSLLNPGKILEEEMTRY 
AYYYSGLGAGVLVAAYIQVSFWTLAAGRQIRKIRQKFFHAILRQEIGWFDINDTTELN 
TRLTDDISKISEGIGDKVGMFFQAVATFFAGFIVGFIRGWKLTLVIMAISPILGLSAA 
VWAKILSAFSDKELAAYAKAGAVAEEALGAIRTVIAFGGQNKELERYQKHLENAKEIG 
IKKAISANISMGIAFLLIYASYALAFWYGSTLVISKEYTIGNAMTVFFSILIGAMAIG 
ETLVLAPEYSKAKSGAAHLFALLEKKPNIDSRSQEGKKPVSDTCEGNLEFREVSFFYP 
CRPDVFILRGLSLSIERGKTVAFVGSSGCGKSTSVQLLQRLYDPVQGQVDGVDAKELN 
VQWLRSQIAIVPQEPVLFNCSIAENIAYGDNSRWPLDEIKEAANAANIHSFIEGLPE 
KY1ITQVGLKGAQLSGGQKQRLAIARALLQKPKILLLDEATSALDNDSEWQVVQHALDK 
ARTGRTCLVVTHRLSAIQNADLIWLHNGKIKEQGTHQELLRNRDIYFKLVNAQSASK 
GRTTIWAHRLSTIRSADLIVTLKDGMLAEKGAHAELMAKRGLYYSLVMSQVMLMGTIj 
SDCGNSLPEVSLLKILKLNKPEWPFWLGTLASVLNGTVHPVFSIIFAKIITVMFGNN 
DLLFFLKIFLYSFLLFFLKQGFSVDFCLFAFQGLFYGRAGEILTMRLRHLAFKAMLYQ 
D I AW FD E KENSTGG LTT I LAI D I AQ I QGATG SR IG VLTQNATNMG LS VI I S F I YGWE M 
TFLILSIAPVIiAVTGMIETAAMTGFANKDKQEIjKHAGKVKIATEALENIRTIVSLTRE 
KAFEQMYEEMLQTQHRRNTSKKAQIIGSCYAFSHAFIYFAYAAGFRFGAYLIQAGRMS 
NALSFDRVFTAIAYGAMAIGETLVLAPEYSKAKSGAAHLFALLEKKPNIDSRSQEGKK 
PLSQDTCEGNLEFREVSFFYPCRPDVFI LRGLSLS I ERGKTVAFVGSSGCGKSTSVQL 
LQRLYDPVQGQQLFDGVDAKELNVQWLRSQIAIVPQEPVLFNCSIAENIAYGDNSRW 
PLDE I KEAANAANI HS FI EGLPKYNTQVGLKGAQLSGGQKQRLAI ARALLQKPKI LLL 
DEATSALDNDSEKVQWQHAI^KARTGRTCLVVTHRLSAIQNADLIVVLHNGKIKEQG 
THQELLRNRD I YFKLVNAQS ASKGRTTI WAHRLST I RS ADLI VTLKDGMLAEKGAHA 
ELMAKRGLYYSLVMSQVMLM 



Further analysis of the NOV 107a protein yielded the following properties shown in 
Table 107B. 



Table 107B. Protein Sequence Properties NOV107a 


PSort 
analysis: 


0.6000 probability located in plasma membrane; 0.4000 probability located in 
Golgi body; 0.3000 probability located in endoplasmic reticulum (membrane); 
0.3000 probability located in microbody (peroxisome) 
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SignalP 
analysis: 



No Known Signal Sequence Predicted 



A search of the NOV 107a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 107C. 



Table 107C. Geneseq Results for NOV107a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV107a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB81064 


Cynomologous monkey P- 
glycoprotein variant 1 - Macaca 
fascicularis, 1280 aa. 
[WO200123565-A1, 05-APR-2001] 


1..1299 
1..1278 


750/1312 (57%) 
964/1312(73%) 


0.0 


AAB81065 


Cynomologous monkey P- 
glycoprotein variant 2 - Macaca 
fascicularis, 1283 aa. 
[WO200123565-A1, 05-APR-2001] 


1..1299 
1..1281 


749/1312(57%) 
967/1312(73%) 


0.0 


AAB81959 


Human MDR1 - Homo sapiens, 
1 280 aa. [ WO200 1 2 1 762-A2, 29- 
MAR-2001] 


1..1299 
1..1278 


749/1324 (56%) 
967/1324(72%) 


0.0 


AAY58186 


Human wild-type multidrug 
resistance- 1 (MDR-1) protein - 
Homo sapiens, 1280 aa. 
[W09961589-A2, 02-DEC-1999] 


1..1299 
1..1278 


749/1324 (56%) 
967/1324(72%) 


0.0 


AAW44073 


Human multidrug resistance P- 
glycoprotein MDR1 - Homo sapiens, 
1280 aa. [WO9740160-A1, 30-OCT- 
1997] 


1..1299 
1..1278 ] 


749/1324(56%) 
967/1324 (72%) 


0.0 



In a BLAST search of public sequence databases, the NOV107a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 107D. 
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Table 107D. Public BLASTP Results for NOV107a 


rruiciu 

Accession 
Number 


Protein/Organism/Length 


NOV107a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


P23174 


Multidrug resistance protein 3 (P- 
glycoprotein 3) - Cricetulus griseus 
(Chinese hamster), 1281 aa. 


1..1299 
1..1279 i 


818/1303(62%) i 
999/1303 (75%) j 


0.0 


P21440 


Multidrug resistance protein 2 (P- 
glycoprotein 2) - Mus museums 
(Mouse), 1276 aa. 


1..1299 
1..1274 


823/1306(63%) 
998/1306(76%) \ 


0.0 


Q08201 


Multidrug resistance protein 2 (P- 
glycoprotein 2) ■> Rattus norvegicus 
(Rat), 1278 aa. 


1..1299 
1..1276 


823/1309(62%) 
999/1309(75%) 


0.0 


CAC37764 


SEQUENCE 1 FROM PATENT 
WO0123565 - Macaca fascicularis 
(Crab eating macaque) (Cynomolgus 
monkey), 1280 aa. 


1..1299 : 

1.-1278 


750/1312(57%) 
964/1312(73%) 


0.0 


CAC37765 


SEQUENCE 3 FROM PATENT 
WO0123565 - Macaca fascicularis 
(Crab eating macaque) (Cynomolgus 
monkey), 1283 aa. 


1..1299 
1.1281 


749/1312(57%) 
967/1312(73%) 


0.0 



PFam analysis predicts that the NOV107a protein contains the domains shown in the 
Table 107E. 



Table 107E. Domain Analysis of NO VI 07a 


Pfam Domain 


NOV107a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


ABC membrane: domain 1 
of2 


57..350 


115/301 (38%) 
252/301 (84%) 


3.3e-83 


MVIN: domain 1 of 1 


57..447 


70/531 (13%) 
263/531 (50%) 


5.8 


SAA_proteins: domain 1 of 1 


518..524 


6/7 (86%) 
7/7 (100%) 


3 


ABC tran: domain 1 of 2 


424..609 


76/199 (38%) 
150/199 (75%) 


3.1e-56 


DsbD: domain 1 of 1 


722..926 




9.6 
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126/249 (51%) 




ABC membrane: domain 2 
of2 


722.. 1008 


80/297 (27%) 
222/297 (75%) 


2.2e-43 


ABC_tran: domain 2 of 2 


1083..1270 


77/202 (38%) 
154/202 (76%) 


7.1e-54 


GidB: domain 1 of 1 


1170..1312 


29/202 (14%) 
97/202 (48%) 


6.6 



Example 108. 

The NOV 108 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 108 A. 



Table 108A. NOV108 Sequence Analysis 




SEQIDNO:305 1 520 bp 


NOV108a, 

CG59609-01 DNA Sequence 


CCCCGTTCTATCAG C CATGGTCAACCC CACCAGGTTCTTAGACATCATCGTGGATGGT 
G AG CTCTTGGGACGTGTCTCCTTTGAG CTGTTTG CAG ACAAG ATTCCAAAG AC AG C AG 
AAAATTTTTGTGCTCTAATCATTGGAGAGAAAGGATTTGGTTATAAAGGTTCCTACTT 
TCACAGAATTGTTCCTGGGTTTATGTGTCAGGGTGGTGACTTCACACAGCATAATGGC 
ACTGGTGGCAAGT CCATCT ACGGGAAGAAATTTG ATG ATGAG AACTTCGT C CT AAATT 
ATACAGGTCCTGGCATCTTGTGCGTGGAGAATGCTGGACCCAACACAAATGGTTCCCA 
GTTTTTCATCTGCACTGCCATGTCTGAGTGGTTGGATGGCATGCAGGTGGTCTTTGGC 
AAGGGAAGGAAGGTGAGTATTGTGGAAGCCATGGAGTGCTTTGGGTCCACAAATGGCA 
AGACCAGCAAGAAGATCACCATTGCTGACTGTGGACAACTCTAATAGGTTTGACTT 




ORF Start: ATG at 17|ORF Stop: TAA at 506 




SEQIDNO: 306 |i63aa 


MW at 17734. lkD 


NOV108a, 

CG59609-01 Protein Sequence 


MVNPTRFLDIIVDGELLGRVSFELFADKIPKTAENFCALIIGEKGFGYKGSYFHRIVP 
G FMCQGGDFTQHNGTGGKS I YGKKFDD ENF VLNYTG PG I L S VENAG PNTNG S QFF I CT 
AMSEWLDGMQWFGKGRKVSIVEAMECFGSTNGKTSKKITIADCGQL 



Further analysis of the NOV108a protein yielded the following properties shown in 
Table 108B. 



Table 108B. Protein Sequence Properties NOV108a 


PSort 
analysis: 


0.6400 probability located in microbody (peroxisome); 0.6000 probability 
located in plasma membrane; 0.4500 probability located in cytoplasm; 0.1000 
probability located in mitochondrial matrix space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV108a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication; yielded several 
homologous proteins shown in Table 108C. 
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Table 108C. Geneseq Results for NOVlOSa 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV108a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAU01195 


Human cyclophilin A protein - Homo 
sapiens, 165 aa. [WO2001 32876- A2, 
10-MAY-2001] 


1..163 
1..164 


134/164(81%) 
147/164 (88%) 


2e-75 


AAW56028 


Calcineurin protein - Mammalia, 165 
aa. [WO9808956-A2, 05-MAR-1998] 


1..163 
1..164 j 


134/164(81%) 
147/164(88%) 


2e-75 


AAR13726 


Bovine cyclophilin - Bos taurus, 163 
aa. [US5047512-A, 10-SEP-1991] 


2..163 
1..163 


133/163 (81%) 
146/163 (88%) 


5e-75 


AAG65275 


Haematopoietic stem cell 
proliferation agent related human 
protein #2 - Homo sapiens, 164 aa. 
[JP2001163798-A, 19-JUN-2001] 


2..163 
1..163 


133/163 (81%) 
146/163 (88%) 


9e-75 


AAP90431 


Cyclophilin - Homo sapiens (human), 
164 aa. [EP326067-A, 02-AUG-1989] 


2..163 
1..163 


133/163 (81%) 
146/163(88%) 


9e-75 



In a BLAST search of public sequence databases, the NOV 108a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 108D. 



Table 108D. Public BLASTP Results for NOV108a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV108a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


CAC39529 


SEQUENCE 26 FROM PATENT 
WO01 32876 - Homo sapiens 
(Human), 165 aa. 


1..163 
1..164 


134/164(81%) 
147/164(88%) 


8e-75 


Q9BRU4 


PEPTTDYLPROLYL ISOMERASE A 
(CYCLOPHILIN A) - Homo sapiens 
(Human), 165 aa. 


1..163 
1..164 


134/164 (81%) 
146/164 (88%) 


2e-74 


P04374 


Peptidyl-prolyl cis-trans isomerase A 
(EC 5.2.1.8) (PPIase) (Rotamase) 
(Cyclophilin A) (Cyclosporin A- 
binding protein) - Bos taurus (Bovine), 
and, 163 aa. 


2..163 
1..163 


133/163 (81%) 
146/163 (88%) 


2e-74 


P05092 


Peptidyl-prolyl cis-trans isomerase A 
(EC 5.2.1.8) (PPIase) (Rotamase) 


2..163 
1..163 


133/163 (81%) 
146/163 (88%) 


3e-74 
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binding protein) - Homo sapiens 
(Human),, 164 aa. 








Q9TTC6 


CYCLOPHILIN 1 8 - Oryctolagus 
cuniculus (Rabbit), 164 aa. 


1..163 
1..164 


133/164 (81%) 
147/164 (89%) 


5e-74 



PFam analysis predicts that the NOV 108a protein contains the domains shown in the 
Table 108E. 



Table 108E. Domain Analysis of NO VI 08a 


Pfam Domain 


NOV108a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


pro isomerase: domain 1 of 
1 


5..163 


101/181 (56%) 
137/181 (76%) 


5.2e-79 



Example 109. 

The NOV109 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 109 A. 



Table 109 A. NOV109 Sequence Analysis 




SEQIDNO:307 


887 bp 


NOV109a, 

CG59613-01 DNA Sequence 


GATATCAriTTT^ATGGCAGCCATTGTTAAGCCTCCAGAACCTATACCTTTAAAATX3G 

TT AA C AGAT AAGC CAGTTT GG AT AG AA C AATGG C C ACTG AGT AAAG AG AAACTGG AGG 

CTTTAGAGGATTTGGTTACTGAACAATTCTCAATAATCATTTTCCAAAAAGTGAACCT 

ACACAGCATGAAAGTATCACACATTTCCTTAGTGCAGCTAACCCTGTGTGACCAGGGC 

TT CAACACAT A CCACTGTG AC CACAAC CT AG CCATG AG CATG AG C CTCACC AG CATGT 

CCAAAATGCTAAAATACAACAATGGCAGTGAAGACATCACTACATGGAGGGCTGAAGG 

TACTATGGATCTCTTGGTGCTAGAATTTGAAGCACTAAATCAAGAGAACTTTGTGGAC 

TGTGAATTCAAGTTAATGACTCTAGATGTTG^GCAACTTGAAATTCCAGAACAAGAGT 

ACAGCTCTCTAATAAAGATGCATTCTAGTGAATTTGTTCATATATGCCAAGATCTCAG 

T C AT ATTGG AG AGTCTGCT AT AATT T CTTG TGCAAAAG ATGG AGTG AATTTTTCTG CA 

AATGGAGAACTTGGACATGGAAACATTGCCACAATTCCCCAAACAA^ 

AAG AAG AGG AGG CTG TTGCCAT AATG ATG AATG GG C C AG TTCAG CT AACTTTTG CAC T 

AAGTTACTTAAATTTCTTTATAACAGGCACTCCACTCTCTCAGATGCACCCCTTGCTG 

GAGAGTATAAGATTGCCGGATATGGAACATTTAAAGTATTATTTGGCTCCCAAAATTG 

AGGATGAAAAAGGATTTTAGAAATTCTTAGAATCCAAGAAAATAAAACTAAGCTCTTT 


GAAAATTGCTTCTGAGA 




ORF Start: ATG at 14 


ORF Stop: TAG at 830 




SEQ ID NO: 308 


272 aa 


MW at 30831. lkD 


NOV109a, 

CG59613-01 Protein Sequence 


MAAIVKPPEPIPLKWLTDKPVWIEQWPLSKEKLEALEDLVTEQFS II IFQKVNLHSMK 
VSHISLVQLTLCDQGFNTYHCDHNLAMSMSLTSMSKMLKYNNGSEDITTWRAEGTMDL 
LVLEFEALNQENFVDCELKLMTLDVEQLEI PEQEYSCVI KMHSSEFVHICQDLSHIGE 
SAIISCAKDGVNFSANGELGHGNIATIAQTSNYNKEEEAVAIMMNGPVQLTFALSYLN 
FFITGTPLSQMHPLLESIRLPDMEHLKYYLAPKIEDEKGF 



Further analysis of the NOV 109a protein yielded the following properties shown in 
Table 109B. 
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Table 109B. Protein Sequence Properties NOV109a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0. 1000 probability located in 
mitochondrial matrix space; 0.1000 probability located in lysosome (lumen); 
0.0000 probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


Likely cleavage site between residues 19 and 20 



A search of the NOV 109a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 109C. 



Table 109C. Geneseq Results for NOV 109a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV109a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAY51639 


Human PCNA protein fragment - 
Homo sapiens, 261 aa. 
[WO200008164-A2, 17-FEB-2000] 


25..271 
8..260 


158/255 (61%) 
184/255 (71%) 


8e-78 


AAY52010 


Human PCNA protein - Homo 
sapiens, 261 aa. [DE19840771-A1, 
10-FEB-2000] 


25..271 
8..260 


158/255 (61%) 
184/255 (71%) 


8e-78 


AAB43712 


Human cancer associated protein 
sequence SEQ ID NO: 1 157 - Homo 
sapiens, 269 aa. [WO200055350-A1, 
21-SEP-2000] 


25..271 
16..268 


158/255 (61%) 
184/255 (71%) 


8e-78 


AAG75139 


Human colon cancer antigen protein 
SEQ ID NO:5903 - Homo sapiens, 
268 aa. [WO200122920-A2, 05- 
APR-2001] 


25..269 
16..266 


157/253 (62%) 
182/253 (71%) 


5e-77 


AAW90758 


Human PCNA protein fragment #2 - 
Homo sapiens, 236 aa. 
[DE19840771-A1, 10-FEB-2000] 


39..268 
1..236 


149/238 (62%) 
171/238(71%) 


7e-73 



In a BLAST search of public sequence databases, the NOV 109a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 109D. 
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Table 109D. Public BLASTP Results for NOV109a 


II UIC111 

Accession 
Number 


Protein/Organism/Lengtta 


NOV109a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


P12004 


Proliferating cell nuclear antigen 
(PCNA) (Cyclin) - Homo sapiens 
(Human), 261 aa. 


25..271 
8..260 


158/255 (61%) 
184/255 (71%) 


3e-77 


P04961 


Proliferating cell nuclear antigen 
(PCNA) (Cyclin) - Rattus norvegicus 
(Rat), 261 aa. 


25.. 271 
8.. 260 


158/255 (61%) 
185/255 (71%) 


5e-77 


P57761 


Proliferating cell nuclear antigen 
(PCNA) - Cricetulus griseus (Chinese 
hamster), 261 aa. 


25. .271 
8..260 


158/255 (61%) 
184/255 (71%) 


7e-77 


Q91ZH2 


1 1 DAYS EMBRYO CDNA, RIKEN 
FULL-LENGTH ENRICHED 
LIBRARY, CLONE:2700095L20, 
FULL INSERT SEQUENCE - Mus 
musculus (Mouse), 261 aa. 


25.. 272 
8..261 


156/256(60%) 
183/256 (70%) 


le-75 


P17918 


Proliferating cell nuclear antigen 
(PCNA) (Cyclin) - Mus musculus 
(Mouse), 261 aa. 


25..270 
8.. 259 


155/254(61%) j 
182/254 (71%) 


5e-75 



PFam analysis predicts that the NOV 109a protein contains the domains shown in the 
Table 109E. 



Table 109E. Domain Analysis of NOV! 09a 


Pfam Domain 


NOV109a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


PCNA: domain 1 of 1 


23..143 


46/128 (36%) 
83/128 (65%) 


2.3e-20 


PCNA C: domain 1 of 
1 


145..265 


59/131 (45%) 
98/131 (75%) 


1.6e-45 



Example 110. 



The NOV1 10 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 1 10A. 



Table 110A. NOV110 Sequence Analysis 
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SEQ ID NO: 309 


1233 bp 


NOVllOa, 

CG59619-01 DNA Sequence 


TGGCAATGGAAGAAGAGATCCCCGCGCTCTTCATTGACAATGGCTCCGGCATGTGGAA 


AGCAGCTTTGCTGGGAGACAATGCCCTCCGAGCCATATTCCCCTCCATCATCGGGCAC 
CCCCGGCACCAGGGCGTGATGGTGGGCATGGGCCAGAAGGACTCCTACGTGGGCGACC 
AGG CCC AG AGCAAGTG CGG CATCCTGACCCTG AAGTACCCCATCAAGCATGGC ATCGT 
CACAAACTGGGACGACATGGAGAAGATCTGGCACCATGTTTTCTACAACGAGCTGTGC 
GTGGCCCTGGAGGAGCAGGTGGTGCTGCTGACCGAGGCCCCGCTAAACCCCAGGGCCA 
ATAGGGAGAAGATGACTCAGATCATGTTTAAGACCTTCAACACCCAGGCCATGTACGT 
GGCCATTCAGGCCGTGCTGACCCTCCACAGCTCTGGTTGCACCACTGGCATTGTCATG 
G ACTCTGGAGATGGGGTC ACCCAC ACAGTG CCCATCTACGAGCGCC ACAC CCTCC CTC 
ACACCATCTTGCATCTGGACCTGGCTGGCCAGGACCTTACTGACTACCTCATGAAGAT 
CCCTACCTACCGCAGCTATAGCTTC^CACCATGGCCAAGTGGAAAATCGTGCGCAAC 
ATCAAGGAGAAGCTATGCTATGTCGCTCTGGACTTCGAGGAGGAGATGGCCACTGCTG 
CATCCTCCTCCTCCCTGGAGAAGAGCTACGAGCTGCCTGACAGCCAGGCCATCATTAT 
TAGCAATGAGCGGTTCCG^TGTCCGGAGGCACTGTTCCAGCCTTCCTTCCTGGGCATG 
GAATCCTGTGGCATCCATGAAAGTACCTTCAACTCCATCATGAAGTGTGATATGGACA 
TCCCCAAAGACCTGTACGCCAACACAGTGCTGTCTGGCGTCACCACCATGTACCCTGG 
CATCCCCAATAGGATGCAGAAGGAGATCACTGCCCTGGCATCCAGCACCATGAAGATC 
AAGATATCGTGCCCCATCGTGCCCCCAGAGTGCAAGTACTTTGTGTGGATCGGTGGCT 
CCATCCTGGCCTCACTGTCCACCTTCCAGCAGATGTGGATTAGCAAGCAGGAGTACAA 
CG AGTCGGGC CC CTCCATCATCCACCGCAAATGGACTG CGAGCAGATGCATAGCATTT 
GCTGC7VTGGGTTAATTCAGAAGTATAAATTTGCCCCTGGCAAATGCATATACCTCATG 




CTAG CCTCACGATAC 








ORF Start: ATG at 6 


ORF Stop: TAA at 1185 




SEQ ID NO: 310 


393 aa 


MW at 44147.5kD 


NOVllOa, 

CG59619-01 Protein Sequence 


MEEE I PALFI DNGSGMWKAALLGDNALRAI FPSI IGHPRHQGVMVGMGQKDSYVGDQA 
QSKCGILTLKYPIKHGIVTNlTODMEKIWHHVFYNELCVAIiEEQVVLLTEAPLNPRANR 
EKMTQIMFKTFNTQAMWAIQAVLTLHSSGCTTGIVMDSGDGVTHTVPIYERHTLPHT 
I LHLDLAGQD LTD YliMKI PTYRS YS FNTMAKWKI VRN I KEKLC YVALD FE E EMATAAS 
SSSLEKSYELPDSQAIIISNERFRCPEALFQPSFLGMESCGIHESTFNSIMKCDMDIP 
KDLYANTVLSGVTT^PGIPNRMQKEITALASSTMKIKISCPIVPPECKYFVWIGGSI 
LAS LST FQQMW I S KQE YNE S G PS 1 1 HRKWT AS RC I AFAAWVNS E V 



Further analysis of the NOV1 10a protein yielded the following properties shown in 
Table HOB. 



Table HOB. Protein Sequence Properties NOVllOa 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.1547 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV1 10a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 1 1 0C. 
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Table HOC. Geneseq Results for NO VI 10a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOVllOa 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for { 
the Matched 
Region 


Expect 
Value 


AAU32060 j 


Novel human secreted protein #2551 - 
Homo sapiens, 399 aa. 
[WO200179449-A2, 25-OCT-2001] 


1..376 
25..397 


315/376 (83%) 
336/376(88%) ! 


e-180 


AAB43991 ] 


Human cancer associated protein 
sequence SEQ ID NO: 1436 - Homo 
sapiens, 413 aa. [WO200055350-A1, 
21-SEP-2000] 


1..376 
39..411 


311/376(82%) \ 
336/376(88%) 


e-179 


AAP61532 ; 


Sequence of beta-actin - Homo 
sapiens, 375 aa. [EP174608-A, 19- 
MAR-1986] 


1..376 j 
1..373 


311/376 (82%) 
335/376 (88%) 


e-179 


AAB12985 


Human beta-actin protein sequence - 
Homo sapiens, 374 aa. [US6087398- 
A, ll-JUL-2000] 


2..376 
1.-.372 


310/375 (82%) 
334/375(88%) j 


e-178 


AAR50328 


Drug resistant structural protein - 
Homo sapiens, 375 aa. [JP06038773- 
A, 15-FEB-1994] 


1..376 
1..373 


309/376 (82%) 
335/376(88%) | 


e-178 



In a BLAST search of public sequence databases, the NOV1 10a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 1 1 0D. 



Table HOD. Public BLASTP Results for NOVllOa 


Protein 
Accession j 
Number ; 


Protein/Organism/Length 


NOVllOa 1 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


P02571 


Actin, cytoplasmic 2 (Gamma-actin) - 
Homo sapiens (Human),, 375 aa. 


1..376 
1..373 


315/376(83%) 
336/376(88%) 


e-179 


ATBOG \ 


actin gamma (tentative sequence) - 
bovine, 374 aa. 


2..376 
1..372 


314/375(83%) 
335/375(88%) 


e-179 


P53505 • 


Actin, cytoplasmic type 5 - Xenopus 
laevis (African clawed frog), 376 aa. 


2..376 
3..374 


313/375(83%) ; 
335/375(88%) 


e-178 


P29751 | 


Actin, cytoplasmic 1 (Beta-actin) - 
Oryctolagus cuniculus (Rabbit), 375 
aa. 


1..376 
1..373 


311/376(82%) 
337/376(88%) 


e-178 
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093 400 


ACTIN, CYTOPLASMIC 1 (BbTA- 


1 1 "7 zT 

1..376 


311/376(82%) ; 


e-178 




ACTIN) (CYTOPLASMIC BETA 


1..373 


336/376(88%) \ 






ACTIN) - Xenopus laevis (African 










clawed frog), 375 aa. 









PFam analysis predicts that the NOV1 10a protein contains the domains shown in the 
Table 110E. 



Table 110E. Domain Analysis of NOVllOa 


Pfam Domain 


NOVllOa Match Region 


Identities/ 1 
Similarities j Expect Value 
for the Matched Region 1 


actin: domain 1 of 1 


1..378 


284/382 (74%) 2.2e-227 
336/382 (88%) | 



Example 111. 

The NOV1 1 1 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 1 1 1 A. 



Table 1 1 1 A. NO VI 1 1 Sequence Analysis 




SEQIDNO:311 


1197 bp 


NOVllla, 

CG59621-01 DNA Sequence 


AACCATGTCTAAGCGGGAGTCCTTTAACCTGGAAAGTTATGAATTGGACAAAAGCTTC 
TGGCTAACCAGATTCACTG AACTG AAGGG CACAGGTTGCAAAGTG CCCCAAG ATGTCT 
TGCAAAAATTGCTGGAATCTTTACAGGAGAACCACTTCCAAGAAGATGAGCAGTTTCT 
GGGAGCCGTTATGCCAAGGCTTCGCATTCGAATGGATACTTGTCCCATTTCTTTGAGG 
CATGGTGGGCTTTCCTTGGTTCAAACCACAGATTACATTTACCCGATCGTAGACGACC 
CTTA<^TGATGGGCAGGATAGCATGTGCCAATGTCCTCAGTGACCTCTATGCAATGGG 
GGTCACAGAATGTGACAATATGCTGATGCTCCTTGGAGTCAGTAATAAAATGACCGAC 
AGGGAAAGGGATAAAGTGATGCCTCTGATTATCCAAAGTTTTAAAGATGCAGCTGAGG 
AAGCAGGAATGTCTGTAATGGTCAGCCAAACAGTACTAAATCCCTGGATTGTCCTGGG 
AGG AGTCACTACCACTGT CTTCCAGCC CAATGAATTT ATC ATGCCAGACAATGCAGTG 
CCAGGGGACGTGCTGGTGTTGACAAAACCCCTGGGGACACAGGTGGCAGTGGCTGTGC 
ACCAGTGGCTGGATATTCCTTTGAAATGGAATAAGATTAAGCTAGTGGTCACCGAAGA 
TGTAGAGCTGGCCAACCAGGAGGCGATGATX5AACATGGTGAGGCTCAACAGGACAGCT 
GCAGGACTCATGCACACGTTCAATGCCCACATGGCCACTGACATCACGGGCTTCGGGA 
TTTTGGGCCACGTGCAGAACCTAGCCAAGCAGCAGAGGAACGAGGTGTCGTTTGTAAT 
TCACAACCTCCTGGTCCTGGCCAAGATGGCTGCGGTGAGCAAGGCCTGCGGAAACATG 
TTCAGCCTCATCCATCGGACCTGCCCGGAGACCTCAGGCGGCCTTCTGATCTGTTTAC 
CATGTCAGCAAGCAGCTCGGTTCTGTGCAGAGATAAAGTCCCCCAAATATAGTGAAGG 
CCACCAAGCATGGATTATTGGGATTGTAGAGAAGGGCAACCACACAGCCAGAATCATA 
GACAAACCCCAGATCATCAAGGTTGCACCACAAGTGGCCACTCAAAATGTGAATCTCA 
CA CCCGGG G C CACAT CTT AATCT AGAC AG AAAT AGCT 




ORF Start: ATG at 5 


ORF Stop: TAA at 1178 




SEQIDNO:312 


391 aa 


MWat43193.9kD 


NOVllla, 

CG59621-01 Protein Sequence 


MSKRESFNLESYELDKSFWLTRFTELKGTGCKVPQDVIiQKLLESLQENHFQEDEQFLG 
AVMPRLRIGMDTCAISLRiCGGI^LVQTTOYiyPIVTJDPYMMGRIACANVLSDLyAMGV 
TECDNMLMLLGVSNKMTDRERDKVMPLI I QS FKDAAEEAGMS VMVSQTVLNPW I VLGG 
VTTTVFQ PNE F I MPDNAVPGDVLVLTKPLGTQVAVA VHQWLD I PLKWNKI KLWTEDV 
E LANQEAMMNMVR LNRT AAG LMHT FN AHMATD I TG FG I LGHVQNLAKQQRNEVS FVI H 
NLLV UAKMAA VS KACGNMF S LMHGTCPETSGG LL I C L PCQQAARFCAE I KS P KYS EGH 
QAWI IG I VE KGNHTAR I I D KPQI I KVA PQVATQNVNLT PGATS 
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Further analysis of the NOV1 1 la protein yielded the following properties shown in 
Table 11 IB. 



Table 11 IB. Protein Sequence Properties NOVllla 


PSort 
analysis: 


0.8500 probability located in endoplasmic reticulum (membrane); 0.4400 
probability located in plasma membrane; 0.1000 probability located in 
mitochondrial inner membrane; 0.1000 probability located in Golgi body 


SignalP 
analysis; 


No Known Signal Sequence Predicted 



A search of the NOV1 1 la protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 1 1 1 C. 



Table 111C. Geneseq Results for NOVllla 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOVllla 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB58174 


Lung cancer associated polypeptide 
sequence SEQ ID 5 12 - Homo 
sapiens, 250 aa. [WO200055180-A2, 
21-SEP-2000] 


166..391 
20..243 


168/227 (74%) 
189/227 (83%) 


2e-88 


AAO01161 


Human polypeptide SEQ ID NO 
15053 - Homo sapiens, 122 aa. 
[WO200164835-A2, 07-SEP-2001] 


147..264 
1..118 


81/119(68%) 
92/119(77%) 


2e-36 


AAB53700 


Human colon cancer antigen protein 
sequence SEQ ID NO: 1240 - Homo 
sapiens, 106 aa. [WO200055351-A1, 
21-SEP-2000] 


42..99 
1..58 


53/58 (91%) 
54/58 (92%) 


le-24 



In a BLAST search of public sequence databases, the NOV1 11a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 1 1 ID. 
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Table HID. Public BLASTP Results for NOVllla 


Protein 
Number 


Prntein/Orpanisiri/Lenpth 


NOVllla 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9BVT4 


SELENOPHOSPHATE 
SYNTHETASE , HUMAN 
SELENIUM DONOR PROTEIN - 
Homo sapiens (Human), 392 aa. 


1..391 
1..392 


364/392 (92%) 
367/392 (92%) 


0.0 


P49903 | 


Selenide.water dikinase 1 (EC 
2.7.9.3) (Selenophosphate synthetase 
1) (Selenium donor protein 1) - Homo 
sapiens (Human), 383 aa. 


1..375 
1..376 


348/376 (92%) 
351/376 (92%) 


0.0 


AAC50958 ! 


SELENOPHOSPHATE 
SYNTHETASE 2 - Homo sapiens 
(Human), 448 aa. 


2..391 
33..441 


272/411 (66%) 
313/411 (75%) 


e-147 


Q99611 J 


Selenide.water dikinase 2 (EC 
2.7.9.3) (Selenophosphate synthetase 
2) (Selenium donor protein 2) - Homo 
sapiens (Human), 448 aa. 


2..391 
33..441 


272/411 (66%) 
313/411(75%) 


e-147 


AAC53024 ] 


SELENOPHOSPHATE 
SYNTHETASE 2 - Mus musculus 
(Mouse), 452 aa. 


2..387 
36..441 


267/407 (65%) 
307/407 (74%) 


e-146 



PFam analysis predicts that the NOV1 1 la protein contains the domains shown in the 
Table 11 IE. 



Table 1 1 IE. Domain Analysis of NOV1 1 la 


Pfam Domain 


NOV1 11a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


AIRS: domain 1 of 1 


32..188 


29/180 (16%) 
113/180(63%) 


3e-18 


AIRS_C: domain 1 of 1 


191. .367 


34/197 (17%) 
125/197(63%) 


l.le-20 



Example 112. 



The NOV1 12 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 1 12 A. 



Table 112A. NOV112 Sequence Analysis 
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SEQIDNO:313 


1544 bp 


NOV 112a, 

CG59625-01 DNA Sequence 


CGATGGGACACAGACAGGTCACCCCAGCTCTGATCTTTGCCATCACAGTTGCTACAAT 
CGG CTCTTTCCAGTTTGGCTACAAC ACTGGGGTC ATC AATGCTC CTGAGACGGTGCAG 
ATCATAAAGGAATTTATCAATAAAACTTTGACGGACAAGGCAAATGCCCCTCCCTCTG 
AGGTGCTGCTCACGAATCTCTGGTCCTTGTCTGTGGCCATATTTTCCGTCGGGGGTAT 
GATCGGCTCCTTTTCCGTCGGACTCTTTGTTAACCGCTTTGGCAGGAGGCGCAATTCA 
ATGCTGATTGTCAACCTGTTGGCTGCCACTGGTGGCTGCCTTATGGGACTGTGTAAAA 
TAGCTGAGTCAGTTGAAATGCTGATCCTGGGCCGCTTGGTTATTGGCCTCTTCTGCGG 
ACTCTGCACAGGTTTTGTGCCCATGTACATTGGAGAGATCTCGCCTACTGCCCTGAGG 
GGTGCCTTTGGCACTCTCAACCAGCTGGGCATAGTTATTGGAATTCTGGTGGCCCAGG 
TAATCTTTGGTCTGGAACTCATCCTTGGGTCTGAAGAGCTATGGCCGGTGCTATTAGG 
CTTTACCATCCTTCCAGCTATCCTGCAAAGTGCAGCCCTTCCATGTTGCCCTGAAAGT 
CCCAGATTTTTGCTCATTAACAGAAAAAAAGAGGAGAATGCTACGCGGGTCCTCCAGC 
GGTTGTGGGGCACCCAGGATGTATCCCAAGACATCCAGGAGATGAAAGATGAGAGTGC 
AAGGATGTCACAAGAAAAGCAAGTCACCGTGCTGGAGCTCTTTAGAGTGTCCAGCTAC 
CGACAGCCCATCATCATTTCCATTGTGCTCCAGCTCTCTCAGCAGCTCTCTGGGATCA 
ATGCTGTGGTGTTCTATTACTCAACAGGAATCTTCAAGGATGCAGGTGTTCAACAGCC 
C^TCTATGCC^CCATCAGCGCGGGTGTGGTTAATACTATCTTCACTTTACTTTCTGTA 
GTAGCTCAGATGCTGTTTTCATGGAAAGGAAAACTGAAGTTTCATGTCATAACTGTTT 
CTTTGTTATTAAAGCTGGGTTACACTGTCTTTAAATTTAATCTTCTGTGTTCCTTCCT 
CTTACAGAATCACTATAATGGGATGAGCTTTGTCTGT ATTGGGG CTATCTTGGT CTTT 
GTGG CCTGTTTTG AAATTGGACC AGG CCC CATTCCCTGGTTTATTGTGGC CGAACTCT 
TCAGCCAGGGCCCCCGCCCAGCTGCGATGGCAGTGGCCGGCTGCTCCAACTGGACCTC 
CAACTTCCTAGTCGGATTGCTCTTCC CCTCTGCTG CTTACTATTTAGGAG CCT ACGTT 
TTTATTATCTTCACCGGCTTCCTCATTAC CTTCTTGGCCTTT ACCTTCTT CAAAGTCC 
CTGAG ACCCGTGGCAGG ACTTTTGAGGAT ATCACACGGGC CTTTGAAGGG CAGG C ACA 
CGGTGCAGATAGATCTGGGAAGGACGGCGTCATGGGGATGAACAGCATCGAGCCTGCT 
AAGGAGACCACCACCAATGTCTAAGTCATGCCTCCT 




ORF Start: ATG at 3 


ORF Stop: TAA at 1530 




SEQIDNO:314 


509 aa MW at 55571. 7kD 


NOV112a, 

CG59625-01 Protein Sequence 


MGHRQVTPALI FAITVATIGS FQFG YNTGVI NAPETVQI I KE FINKTLTDKANAPPSE 
VLLTNLWSLSVAIFSVGGMIGSFSVGLFWRFGRRRNS^IVNLIAATGGCLMGIjCKI 
AESVEMLI LGRLVIGLFCGLCTGFVPMYIGEI SPTALRGAFGTLNQLGI VIGI LVAQV 
IFGLELILGSEELWPVLLGFTILPAIIiQSAALPCCPESPRFLLINRKKEENATRVLQR 
LWGTQD VS QD I QEMKDE S ARMS QEKQVTVLE LFRVS S YRQ P 1 1 1 S I VLQLS QQLSG I N 
AWFYYSTGI FKDAGVQQPI YATI SAGWNTI FTLLS WAQMLFSWKGKLKFHVI TVS 
LLLKLGYTVFKFNLLCSFLLQNHYNGMSFVCIGAILVFVACFEIGPGPIPWFIVAELF 
SQGPRPAAMAVAGCSNWTSNFLVGLLFPSAAYYLGAYVFIIFTGFLITFLAFTFFKVP 
ETRGRT FEDI TRAFEGQAHGADRSGKDGVMGMNS I E PAKETTTNV 



Further analysis of the NOV1 12a protein yielded the following properties shown in 
Table 11 2B. 



Table 112B. Protein Sequence Properties NOV112a 


Psort 
analysis: 


0.6400 probability located in plasma membrane; 0.4600 probability located in 
Golgi body; 0.3700 probability located in endoplasmic reticulum (membrane); 
0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Likely cleavage site between residues 22 and 23 



A search of the NOV1 12a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 1 12C. 



Table 112C. Geneseq Results for NOV112a 
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Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV112a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value. 


AAY27289 


Glucose transporter protein GLUT3 - ! 
Homo sapiens, 494 aa. [US5942398- 
A, 24- AUG- 1999] 


1..505 
1..492 


389/505 (77%) 
431/505 (85%) 


0.0 


AAR11360 


Glucose Transporter Protein from 
CHO cells - Cricetulus sp, 492 aa. 
[WO9103554-A, 21-MAR-1991] 


4..491 
6..481 


289/489 (59%) 
364/489 (74%) 


e-156 


AAW17835 


Human glucose transporter GLUT-1 ' 
- Homo sapiens, 492 aa. | 
[W09715668-A2, 01-MAY-1997] 


4..491 
6..481 


287/489 (58%) 
362/489 (73%) 


e-155 


AAW93000 


Human GLUT1 protein - Homo 
sapiens, 492 aa. [W09618957-A1, 
20-JUN-19961 


4..491 
6..481 


284/489 (58%) 
360/489 (73%) 


e-153 


AAB30522 


Amino acid sequence of a consensus 
GLUT polypeptide - Synthetic, 493 
aa. [US6136547-A, 24-OCT-2000] 


6..501 
10..490 


289/496 (58%) 
357/496 (71%) 


e-151 



In a BLAST search of public sequence databases, the NOV 1 12a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 1 12D. 



Table 112D. Public BLASTP Results for NOV112a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV112a 
Residues/ * 

Match 
Residues ; 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


PI 1169 


Solute carrier family 2, facilitated j 
glucose transporter, member 3 j 
(Glucose transporter type 3, brain) - 
Homo sapiens (Human), 496 aa. 


1..509 \ 
1..496 i 


446/510(87%) 
468/510(91%) 


0.0 


P47842 


Solute carrier family 2, facilitated 
glucose transporter, member 3 j 
(Glucose transporter type 3, brain) - 
Canis familiaris (Dog), 495 aa. 


1..507 i 
1..494 : 


400/507(78%) 
446/507 (87%) 


0.0 


P47843 


Solute carrier family 2, facilitated 
glucose transporter, member 3 
(Glucose transporter type 3, brain) - 
Ovis aries (Sheep), 494 aa. 


1..505 
1..492 j 


389/505(77%) 
431/505(85%) 


0.0 


P58352 


Solute carrier family 2, facilitated 


1..505 
1..492 


390/505(77%) 
431/505 (85%) 


0.0 
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(Glucose transporter type 3, brain) - 
Bos taurus (Bovine), 494 aa. 








Q07647 


Solute carrier family 2, facilitated 
glucose transporter, member 3 
(Glucose transporter type 3, brain) - j 
Rattus norvegicus (Rat), 493 aa. 


1..508 ! 
1..492 j 


380/508(74%) 
422/508(82%) 


0.0 



PFam analysis predicts that the NOV1 12a protein contains the domains shown in the 
Table 11 2E. 



Table 112E. Domain Analysis of NO VI 12a 


Pfam Domain 


NOV112a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


Herpes glycop: domain 1 of 
1 


1..249 


40/417 (10%) 
171/417 (41%) 


7.2 


GntP_permease: domain 1 
of 1 


65..329 


70/478 (15%) 
185/478 (39%) 


2.5 


sugar_tr: domain 1 of 1 


12..478 


188/503 (37%) 
410/503 (82%) 


2.2e-158 



Example 113. 

The NOV1 13 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 1 13 A. 



Table 113A. NOV113 Sequence Analysis 




SEQIDNO:315 


1731 bp 


NOV113a, 

CG59887-01 DNA Sequence 


ACTACTTCGCCGACACTCGCCAGCCTCGGCTACGAGCAAAAAATGOVCCGCACCATGA 


GCTCGTTCACCTCGTTTGCCCTGGCCTTTTCCATGGTCTCGATCAACACCGGCGTGGT 
CACGCTGTTCGCCGACCCGTTCAACCGCGTCGGGGGCATCGGCATCCTCCTGTGGCTG 
TTGGTGATCCCGCTGGTGTGCTGCATCGTCATGGTCTACTGCCACCTGGCCGGGCGCA 
TTCCGCTCACCGGCTACGCCTACCAATX3GTCCAGCCGATTGGCGGGCAATCACTTCGG 
CTGGTTTACCGGCTGGGTGGCGTTCACCTCGTTTGTCGCCGGTACAGCCGCCACCTCG 
GCGGCCATCGGTACGGTGTTCGCACCGGAGATCTGGGCCAACCCGACACAGGGTCAGA 
TCCAGGGCCTG AG CATCGG CG CCACG CTGGTGGTGGGCTTGCTG AATATCTGCGGG AT 
TCGCCTGGCCACC CGG ATCAACG ACATCGGCG CGATCATCGAAAT CAT CGGCACGGT A 
CTCCTGGCGATTGCGTTGTTCTTCGGGGTGTTTTTCTTCTTTGAGCACACCCAGGGCG 
TGGCGATCCTGACCTCCGCGCAACCAGTGAGCGGCGGCACGCTCAGCTTCACCACCAT 
CGCCCTCGCCACCTTGCTGCCGGTCTCGGTGCTGCTGGGTTGGGAAGGCGCCGCCGAC 
CTGTCCGAGGAAACCAAGGACCCACGCCGCGCCGCGCCCCGGGCGATGATTCGTGCGG 
TGCTGGTGTCCAGCGTATTGGGCTTCGTGGTGTTCGCCTTGCTGAGCATCGCGATCCC 
GGGCTCGGTCAGCGAACTGCTCAGCCACAGCGAAAACCCGGTGATCAATATCGTGCGC 
CTGCAACTGGGCAATGCCGCCGGCGTGGGCATGATCGTGATCGCTTTCGCCTCGATCC 
TCGCCTGCCTGATCGCCAACATGGCGGTGGCCACGCGCATGACCTTCGCCCTGTCCCG 
GGACAACATGCTGCCGGGCTCCAAGGTGCTGGCGAAGATCAACCCGCACTTCGGCACG 
CCGGTCGCCGCCATCGTGCTGATCACCGCCATCGCCGTGCTGCTGAACCTGGCGAGTG 
GCGGGTTTGTCACGGCGATCTACTCGATGGTCGGCCTGACCTACTACTGCACTTACCT 
GCTX3ACGCTCATTGCCGCGTACCTGGCCTATAAAAACGGCCGGATGCCGGGGGCGCCT 
GCGGGCGTGTTCAGCCTGGGCCGCTGGTTGCTGCCGATGATTATCCTCGGCGGCCTGT 
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GGGCCATCGCGGTGATCCTGACCCTGAGCGTGCCGGAAGAAAGCCACACTGGCGCTAT 
CACCACCGGGGTTACACTCGGCGTGGGCGTGTTGTGGTGGTTGTTTTCACTGCX3CACG 
CGCCTTAACAATGGCACCGCCGGGCCGAGCGGCAAATTGCTCGACCACTAaCCGCTGA 
TTGCAGCCAAAAGACAAAACCCCGAACACCGGGGTTTTGTCTTGTCACCTCCAAGGAG 


CTTCCCGATGTTTGAACAGGCCAGCTGGCTCAATCAACCCCAGCATTGGCX3CCX3AGAA 


GGCGAGCGACTCAAGGTCCGCACCGATGCCAGTACCGATTTCTGGCGTGAAACCCACT 


ATGGTTTTGTACGCGACAACGGGCATTTCCTGTTTGTTGAAACCGACX3GCGACTTTAC 


CGCCCAAGTCAAAATCCACAGTGAGTTTACCCACCTGTATGACCTTCGC 




ORF Start: ATG at 43 


ORF Stop: TAG at 1441 




SEQK)NO:316 


466 aa 


MW at 49070.4kD 


NOV113a, 

CG59887-01 Protein Sequence 


MHRTMSSFTSFALAFSMVSINTGWTLFADPFNRVGGIGILLWLLVIPLVCCIVMVYC 
HLAGRI PLTG YAYQWSSRLAGNHFGWFTGWVAFTS FVAGTAATSAAIGTVFAPE I WAN 
PTQGQIQGLSIGATLWGLLNICGIRLATRINDIGAIIEIIGTVLLAIALFFGVFFFF 
EHTQGVAILTSAQPVSGGTLSFTTIALATLLPVSVLLGWEGAADLSEETKDPRRAAPR 
AMIRAVLVSSVLGFVVFALLSIAIPGSVSELLSHSENPVINIVRLQLGNAAGVGMIVI 
AFASILACLIANMAVATRMTFALSRDNMLPGSKVLAKINPHFGTPVAAIVLITAIAVL 
LNLASGGFVTAI YS MVG LTYYCTYLLTL I AA YLA YKNGRM PGAPAGVF S LGRWLLPM I 
I LGGLWAI AVI LTLS VPEESHTGAI TTGVTLG VGVLWWLFSLRTRLNNGTAGPSGKLL 
DH 




SEQEDNO:317 


1433 bp 


NOV113b, 

CG59887-02 DNA Sequence 


AAAAATGCACCGCACCATGAGCTCGTTCACCTCGTTTGCCCTGGCCTTTTCCATGGTC 
TCGATCAACACCGGCGTGGTCACGCTGTTCGCCGACCCGTTCAACCGCGTCGGGGGCA 
TCGGCATCCTCCTGTGGCTGTTGGTGATCCCGCTGGTGTGCTGCATCGTCATGGTCTA 
CTGCCACCTGGCCGGGCGCATTCCGCTCACCGGCTACGCCTACCAATGGTCCAGCCGA 
TTGGCGGGCAATCACTTCGGCTGGTTTACCGGCTGGGTGGCGTTCACCTCGTTTGTCG 
CCGGTACAGCCGCCACCTCGGCGGCCATCGGTACGGTGTTCGCACCGGAGATCTGGGC 
CAACCCGACACAGGGTCAGATCCAGGGCCTGAGCATCGGCGCCACGCTGGTGGTGGGC 
TTGCTGAATATCTG CGGGATTCG CCTGGCCACCCGG ATCAACGACATCGGCGCGATCA 
TCGAAATCATCGGCACGGTACTGCTGGCGATTGCGTTGTTCTTCGGGGTGTTTTTCTT 
CTTTGAGCACACCCAGGGCGTGGCGATCCTGACCTCCGCGCAACCAGTGAGCGGCGGC 
ACGCTCAGCTTCACCACCATCGCCCTCGCCACCTTGCTGCCGGTCTCGGTGCTGCTGG 
GTTGGGAAGGCGCCGCCGACCTGTCCGAGGAAACCAAGGACCCACGCCGCGCCGCGCC 
CCGGGCGATGATTCGTGCGGTGCTGGTGTCCAGCGTATTGGGCTTCGTGGTGTTCGCC 
TTGCTGAGCATCGCGATCCCGGGCTCGGTCAGCGAACTGCTCAGCCGCAGCGAAAACC 
GGGTGATCAATATCGTGCGCCTGCAACTGGGCAATGCCGCCGGCGTGGGCATGGTCGT 
GATCGCTTTCGCCTCGATCCTCGCCTGCCTGATCGCCAACATGGCGGTGGCCACGCGC 
ATGACCTTCGCCCTGTCCCGGGACAACATGCTGCCGGGCTCCAAGGTGCTGGCGAAGA 
TCAACCCGCACTTCGGCACGCCGGTCGCCGCCATCGTGCTGATCACCGCCATCGCCGT 
GCTGCTGAACCTGGCGAGTGGCGGGTTTGTCACGGCGATCTACTCGATGGTCGGCCTG 
ACCTACTACTGCACTTACCTGCTGACGCTGATTGCCGCGTACCTGGCCTATAAAAACG 
GCCGGATGCCGGGGGCGCCTGCGGGCGTGTTCAGCCTGGGCCGCTGGTTGCTGCCGAT 
GATTATCCTCGGCGGCCTGTGGGCCATCGCGGTGATCCTGACCCTGAGCGTGCCGGAA 
GAAAGCCACACTGGCGCTATCACCACCGGGGTTACACTCGGCGTGGGCGTGTTGTGGT 
GGTTGTTTTCACTG CGCACGCGC CTTAACAATGG CACCGCCGGGCCGAGCGGCAAATT 
GCTCGACCACTAGCCGCTGATTGCAGCCAAAAGACAAAACC 




ORF Start: ATG at 5 


ORF Stop: TAG at 1403 




SEQIDNO:318 


466 aa 


MW at 49075.4kE> 


NOV 113b, 

CG59887-02 Protein Sequence 


MHRTMSSFTSFAIiAFSMVSINTGWTLFADPFNRVGGIGILLWLLVIPLVCCIVMVYC 
HLAGR I PLTGYAYQWS SRIiAGNHFGWFTGWVAFTSFVAGTAATS AAIGTVFAPE I WAN 
PTQGQIQGLSIGATLWGLLNICGIRLATRINDIGAIIEIIGTVLIAIALFFGVFFFF 
EHTQGVAI LTSAQPVSGGTLSFTTI ALATLLPVS VLLGWEGAADLSEETKD PRRAAPR 
AMIRAVLVSSVLGFWFALLSIAIPGSVSELLSRSENPVINIVRLQLGNAAGVGMWI 
AFASILACLIANMAVATRMTFALSRDNMLPGSKVIiAKINPHFGTPVAAIVLITAIAVL 
LNLASGG FVTA I YS MVG LTYYCTYLLT LI AAYLAYKNGRM PGAPAGVF SI^RWLL PM I 
I LGGLWAI AVI LT LS V PEE SHTGAI TTG VT LG VGVLWWLF S LRTRLNNGT AG P SGKLL 
DH 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 1 13B. 



Table 113B. Comparison of NOV113a against NOV113b. 
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Protein Sequence 


NO VI 13a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV 113b 


1..466 
1..466 


343/466 (73%) 
344/466 (73%) 



Further analysis of the NOV1 13a protein yielded the following properties shown in 
Table 113C. 



Table 113C Protein SequencoProperties NOV113a 


PSort 
analysis: 


0.6400 probability located in plasma membrane; 0.4600 probability located in 
Golgi body; 0.3700 probability located in endoplasmic reticulum (membrane); 
0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Likely cleavage site between residues 59 and 60 



A search of the NOV1 13a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 1 13D. 



Table 11 3D. Geneseq Results for NOV 113a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV113a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG49885 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 63155 - Arabidopsis 
thaliana, 504 aa. [EP1033405-A2, 06- 
SEP-2000] 


1..449 
17..492 


122/486(25%) ! 
217/486(44%) 1 


3e-31 


AAG49884 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 63154 - Arabidopsis 
thaliana, 516 aa. [EP1033405-A2, 06- 
SEP-2000] 


1..449 
29..504 


122/486(25%) j 
217/486(44%) 


3e-31 


AAG20282 ' 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 22407 - Arabidopsis 
thaliana, 504 aa. [EP1033405-A2, 06- 
SEP-2000] 


1..449 
17..492 


122/486(25%) j 
217/486(44%) 


3e-31 


AAG20281 i 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 22406 - Arabidopsis 
thaliana, 516 aa. [EP1033405-A2, 06- 
SEP-2000] 


1..449 
29..504 


122/486(25%) j 
217/486(44%) j 


3e-31 


AAG20280 








3e-31 
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SEQ ID NO: 22405 -Arabidopsis 
thaliana, 528 aa. [EP1033405-A2, 06- 
SEP-2000] 


41..516 


217/486 (44%) 





In a BLAST search of public sequence databases, the NOV1 13a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 1 13E. 



Table 113E. Public BLASTP Results for NOV113a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV113a 

Residues/ 

Match 
Residues 


Identities/ 
oiniiiariiies ior 
the Matched 
Portion 


iLxpeci 
Value 




T>r> fXQ A DT "D A A/TTMO 

rKUrJABLb AM1JNU 
ACID/METABOLITE PERMEASE - 
Streptomyces coelicolor, 504 aa. 


d.Ajx) 
27..481 


ijy/^oy (Zy /o) 
214/469 (44%) 


ze-H-i 




AMTMO ArmA/fFTAROT TTF 

/VlVlliNVw/ tW^lXJf J. VIC 1 /VDULJ 1 

PERMEASE - Rhizobium loti 
(Mesorhizobium loti), 518 aa. 


27..485 


209/466 (44%) 




Q92NI8 


PUTATIVE AMINO-ACID 
PERMEASE PROTEIN - Rhizobium 
meliloti (Sinorhizobium meliloti), 
515 aa. 


1..449 
25..487 


122/475 (25%) 
204/475 (42%) 


le-32 


022509 


PUTATIVE AMINO ACID OR 
GABA PERMEASE - Arabidopsis 
thaliana (Mouse-ear cress), 516 aa. 


1..449 
29..504 


122/486 (25%) 
217/486(44%) 


le-30 


Q9ZU50 


PUTATIVE AMINO ACID 
PERMEASE - Arabidopsis thaliana 
(Mouse-ear cress), 517 aa. 


1..449 
29..505 


120/487 (24%) 
216/487 (43%) 


2e-28 



PFam analysis predicts that the NOVl 13a protein contains the domains shown in the 
Table 11 3F. 



Table 113F. Domain Analysis of NOVl 13a 


Pfam Domain 


NOVl 13a Match j 
Region 


Identities/ 
Similarities 
for the Matched 
Region i 


Expect 
Value 


oxidored_q3: domain 1 of 1 


162..307 1 


28/182(15%) 
91/182(50%) ; 


3.7 


ISK_Channel: domain 1 of 1 


196..326 


32/136 (24%) 
55/136 (40%) 


8.8 
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ABC2 membrane: domain 1 
ofl 


122..377 i 


46/273 (17%) 
154/273 (56%) j 


8.3 


SSF* domain 1 of 1 


7..394 ; 


77/470(16%) I 
222/470(47%) j 


7.8 


Aa trans: domain 1 of 1 


29..417 


67/483 (14%) j 
236/483 (49%) j 


9.7 


aa_permeases: domain 1 of 1 


1..451 


86/516(17%) j 
287/516(56%) j 

* •• — ' 


l.le-05 



Example 114. 

The NOV1 14 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 1 14 A. 



Table 114A. NOV114 Sequence Analysis 




SEQIDNO:319 


876 bp 


NOVIHa, 

CG59861-01 DNA Sequence 


AACTTGCTTTTGGGAGCCAGCGGTATOGCGTCGGGCTGCAAGATTGGCCCGTCCATCC 
T CAACAGCG ACCTGGCCAATTT AGGGGCCGAGTGCTCC CGG ATGCTAG ACTCTGGGGC 
CGATTATCTGCACCTGGACGTAATGGACGGGCATTTTGTTCCCAACATCACCTTTGGT 
CACCCTGTGGTGGAAAGCCTTCGAAAGCAGCTAGGCCAGGACCCTTTCTTTGACATGC 
ACATGATGGTGTCCAAGCCAGAACAGTGGGTAAAG CCAATGGCTGTAGCAGG AGC CAA 
TCAGTACACCTTTCATCTCGAGGCTACTGAGAACCCAGGGGCTTTGATTAAAGACATT 
CGGGAGAATGGGATGAAGGTTGGCCTTGCCATCAAACCAGGAACCTCAGTTGAGTATT 
TGGCACCATGGG CTAATCAG ATAGAT ATGGC CTTGGTTATGACAGTGG AACCGGGGTT 
TGGAGGGCAGAAATTCATGGAAGATATGATGCCAAAGGTTCACTGGTTGAGGACCCAG 
TTCCCATCTTTGGATATAGAGGTCGATGGTGGAGTAGGTCCTGACACTGTCCATAAAT 
GTGCAGAGGCAGGAGCTAACATGATTGTGTCTGGCAGTGCTATTATGAGGAGTGAAGA 
CCCCAGATCTGTGATCAATCTATTAAGAAATGTTTGCTCAGAAGCTGCTCAGAAACGT 
TCTCTTGATCGGTGAAACCATAAGGAGCCCAGTGTTCCTGTTCATGAAATCTCCCTTT 


TACTGGAAAACAGGAATATTGACTACCAAATCACAATGCAATTGAAGCCGTACTGCTT 


TTTTGAGCAGTTATTCATTCCAGTGATTAAAACTGATTGTGCAGAATAAAAAAAAAAA 


AAAAAA 




ORF Start: ATG at 25 


ORF Stop: TGA at 709 




SEQ ID NO: 320 


228 aa 


MWat24901.4kD 


NOV114a, 

CG59861-01 Protein Sequence 


MASGCK I G PS I LNSDLANLG AE CS RMLDSGAD YLHLDVMDGH FV PN I T FGHP WE S LR 
KQLGQDPPFDMHMMVSKPEQWVKPMAVAGANQYTFHLEATENPGALIKDIRENGMKVG 
LAI KPGTS VEYLAPWANQIDMALVMTVEPGFGGQKFMEDMMPKVHWLRTQF PSLD I EV 
DGGVG PDTVHKCAEAGANMI VSGS AIMRS ED PRSVINLLRNVCSEAAQKRS LDR 




SEQ ID NO: 321 


730 bp 


NOV114b, 

CG59861-02 DNA Sequence 


AACTTGCTTTTGGGAGCCAGCGGTATGGCGTCGGGCTGCAAGATTGGCCCGTCCATCC 
TCAACAGCGACCTGGCCAATTTAGGGGCCGAGTGCCTCCGGATGCTAGACTCTGGGGC 
CGATTATCTG CAC CTGGACGT AATGGACGGGCATTTTGTTCCCAACATCACCTTTGGT 
CACCCTGTGGTAGAAAGCCTTCGAAAGCAGCTAGGCCAGGACCCTTTCTTTGACATGC 
ACATGATGGTGTCCAAGCCAGAACAGTGGGTAAAGCCAATGGCTGTAGCAGGAGCCAA 
TCAGTACACCTTTCATCTCGAGGCrrACTGAGAACCCAGGX^CTTTGATTAAAGACATT 
CX3GG AG AATGGG ATG AAGGTTGG CCTTG CCATCAAACCAGG AACCTCAGTTGAGT ATT 
TGG CACCATGGGCT AAT CAG AT AGATATGGCCTTGGTTATGACAGTGG AACCGGGGTT 
TGGAGGGCAGAAATTCATGGAAGATATGATGCCAAAGGTTCACTGGTTGAGGACCCAG 
TTCCCATCTTTGGATATAGAGGTCGATGGTGGAGTAGGTCCTGACACTGTCCATAAAT 
GTGCAGAGGCAGGAGCTAACATGATTGTGTCTGGCAGTGCTATTATGAGGAGTGAAGA 
CCCCAGATCTGTGATCAATCTATTAAGAAATGTTTGCTCAGAAGCTGCTCAGAAACGT 
TCTCTTGATCGGTGAAACCATAAGGAGCCCAGTG 




ORF Start: ATG at 25 


ORF Stop: TGA at 709 




SEQ ID NO: 322 


228 aa 


MWat24927.5kD 


NOV114b, 

CG59861-02 Protein Sequence 


MASGCKIGPSIIiNSDLANLGAECLRMLDSGADYLHLDVMDGHFVPNITFGHPVVESLR 
KQLGQD PFFDMHMMVSKPEQWVKPMAVAGANQYT FHLEATENPGALI KD IRENGMKVG 
LAI KPGTSVEYLAPWANQIDMALVMTVEPGFGGQKFMEDMMPKVHWLRTQFPS LDI EV 
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DGGVGPDTVHKCAEAGANMIVSGSAIMRSEDPRSVINLLRNVCSEAAQKRSLDR 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 1 14B. 



Table 114B. Comparison of NOV114a against NOV114b. 


Protein Sequence 


NOV1 14a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV! 14b 


1..228 
1..228 


227/228 (99%) 
227/228 (99%) 



Further analysis of the NOV1 14a protein yielded the following properties shown in 
Table 114C. 



Table 114C Protein Sequence Properties NOV114a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.1753 probability located in lysosome 
(lumen); 0.1000 probability located in mitochondrial matrix space; 0.0000 
probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV1 14a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 1 14D. 



Table 114D. Geneseq Results for NOV114a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Protein/Organism/Length 
[Patent #, Date] 


NO VI 14a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM41358 


Human polypeptide SEQ ID NO 
6289 - Homo sapiens, 247 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..228 
20..247 


227/228(99%) ! 
227/228(99%) j 


e-132 


AAM41357 


Human polypeptide SEQ ID NO 
6288 - Homo sapiens, 247 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..228 
20..247 


227/228(99%) 1 
227/228(99%) ] 


e-132 


AAM39571 


Human polypeptide SEQ ID NO 
2716 - Homo sapiens, 228 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..228 
1..228 


227/228(99%) 
227/228(99%) j 


e-132 
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AAB71912 


Human ISOM-4 - Homo sapiens, 
228 aa. [WO2001 12790- A2, 22- 
FEB-2001] 


1..228 
1..228 


227/228(99%) j 
227/228(99%) j 


e-132 


AAM39572 ; 


Human polypeptide SEQ ID NO 

27 1 7 - Homo sapiens, 246 aa. 1 

[WO200153312-A1, 26-JUL-2001] 


1..228 
1..246 


227/246(92%) | 
227/246(92%) 


e-129 



In a BLAST search of public sequence databases, the NOV1 14a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 1 14E. 



Table 114E. Public BLASTP Results for NOV 114a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NO VI 14a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96AT9 


HYPOTHETICAL 24.9 KDA 
risxj l niiN - Jtiorno sapiens ^riunian ) 9 
228 aa. 


1..228 

1..ZZ0 


227/228 (99%) 

\yy /o) 


e-131 


Q9BSB5 


HYPOTHETICAL 25.3 KDA 
PROTEIN - Homo sapiens (Human), 
232 aa (fragment). 


1..228 
5..232 


227/228(99%) 
227/228 (99%) 


e-131 


AAH19126 


HYPOTHETICAL 24.9 KDA 
PROTEIN - Mus musculus (Mouse), 
228 aa. 


1..228 
1..228 


221/228 (96%) 
226/228 (98%) 


e-129 


043767 


RIBULOSE-5-PHOSPHATE- 
EPIMERASE - Homo sapiens 
(Human), 174 aa (fragment). 


55..228 
1..174 


174/174(100%) 
174/174 (100%) 


2e-98 


Q96N34 


CDNA FLJ31466 FIS, CLONE 
NT2NE2001372, HIGHLY SIMILAR 
TO HOMO SAPIENS PUTATIVE 
RIBULOSE-5-PHOSPHATE- 
EPIMERASE - Homo sapiens 
(Human), 178 aa. 


69..228 
1..178 


160/178 (89%) 
160/178 (89%) 


2e-86 



PFam analysis predicts that the NOVl 14a protein contains the domains shown in the 
Table 11 4F. 



Table 114F. Domain Analysis of NOVl 14a 


Pfam Domain 


NOVl 14a Match 
Region 


Identities/ 
Similarities 


Expect 
Value 
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Region 




Ribul P 3 epim: domain 1 
ofl 


6..204 


95/209 (45%) 
174/209(83%) 


1.9e-105 


IGPS: domain 1 of 1 


179..213 


15/35 (43%) 
27/35 (77%) ! 


0.02 


trp_syntA: domain 1 of 1 


34..222 


45/273 (16%) 
124/273(45%) 


2.9 



Example 115. 

The NOV1 15 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 1 15 A. 



Table 115A. NOV115 Sequence Analysis 




SEQ ID NO: 323 


1761 bp 


NOV115a, 

CG59857-01 DNA Sequence 


AGTGTGGTACCTATCTGTCCCCCCTCTGGAGGGGTTGACAAGGGAAAGGGCACCGGGG 


GGCACAGAGATGCAGGACAGATTGCACATCCTGGAGGACCTGAATATGCTCTACATTC 
GGC AG ATGG C ACT CAG C CTGG AGG ACACGG AG TTG CAG AGG AAG CT AG ACCATG AG AT 
CCGGATGAGGGAAGGGGCCTGTAAGCTGCTGGCAGCCTGCTCCCAGCGAGAGCAGGCT 
CTGGAGGCCACCAAGAGCCTGCTAGTGTGCAACAGCCGCATCCTCAGCTACATGGGCG 
AGCTGCAGCGGCGCAAGGAGGCGCAGGTGCTGGGGAAGACAAGCCGGCGGCCTTCTGA 
CAGTGGCCCGCCCGCTGAGCGCTCCCCCTGCCGCGGCCGGGTCTGCATCTCTGACCTC 
CGG ATT C C ACT CATG TGG AAGG ACACAG AAT ATTT CAAG AACAAAG ACTTGCAC CG CT 
GGGCTGTGTTCCTGCTGCTGCAGCTGGGGGAACACATCCAGGACACAGAGATGATCCT 
AGTGGAC AGGACC CTCACAG AC ATCTCCTTTC AG AGCAATGTGCTCTTCGCTGAGG CG 
GGGCCAGACTTTGAACTGCGGTTAGAGCTGTATGGGGCCTGTGTGGAAGAAGAGGGGG 
CCCTGACTGGCGGCCCCAAGAGGCTTGCCACCAAACTCAGCAGCTCCCTGGGCCGCTC 
CTCAGGGAGGCGTGTCCGGGCATCGCTGGACAGTGCTGGGGGTTCAGGGAGCAGTCCC 
ATCTTGCTCCCCACCCCAGTTGTTGGTGGTCCTCGTTACCACCTCTTGGCTCACACCA 
CACTCACC CTCGCAGCAGTGCAAGATGG ATTCCGCACACATGAC CTCACCCTTG CC AG 
TCATGAGGAGAACCCTGCCTGGCTGCCCCTTTATGGTAGCGTGTGTTGCCGTCTGGCA 
GCTCAGCCTCTCTGCATGACTCAGCCCACTGCAAGTGGTACCCTCAGGGTG CAG CAAG 
CTGGGGAGATGCAGAACTGGGCACAAGTGCATGGAGTTCTGAAAGGCACAAACCTCTT 
CTGTTACCGGCAACCTGAGGATGCAGACACTGGGGAAGAGCCGCTGCTTACTATTGCT 
GTCAACAAGGAGACTCGAGTCCGGGCAGGGGAG CTGG AC CAGGCTCT AGG ACGG CC CT 
TCACCCTAAGCATCAGTAACCAGTATGGGGATGATGAGGTGACACACACCCTTCAGAC 
AG AAAGT CGGGAAG C ACTG C AG AG CTGG ATG G AGG CTCTG TGG CAG CTTTT CTTTG AC 
ATGAGCCAATGGAAG CAGTGCTGTG ATG AAATCATGAAAATTG AAACTCCTG CTCCCC 
GGAAACCACCCCAAGCACTGGCAAAGCAGGGGTCCTTGTACCATGAGATGGCTATTGA 
GCCG CTGG ATG ACAT CGCAGCGGTGACAGACATCCTGACCCAG CGGGAGGGCGCAAGG 
CTGGAGACACCCCCACCCTGGCTGGCAATGTTTACAGACCAGCCTGCCCTGCCTAACC 
CCTGCTCGCCTGCCTCAGTGGCCCCAGCCCCAGACTGGACCCACCCCCTGCCCTGGGG 
GAGACCCCGAACCTTTTCCCTGGATGCTGTCCCCCCAGACCACTCCCCTAGGGCTCGC 
TCGGTTGCCCCCCTCCCACCTCAGCGATCCCCACGGACCAGAGGCCTCTGCAGCAAAG 
GCCAACCTCGCACTTGGCTCCAGTCACCAGTGTGAGAGAGAAAGGTGCTGGCATAGGA 
TCTGCCCAGAAGAGAAAATGA 




ORF Start: ATG at 68 


ORF Stop: TGA at 1715 




SEQ ID NO: 324 


549 aa MW at 61171.0kD 


NOVllSa, 

CG59857-01 Protein Sequence 


MQDRLH I L EDLNML Y I RQMALS L EDTE LQRKLDHE I RMREGAC KLLAACSQREQALE A 
TKSLLVCNSRILSYMGELQRRKEAQVLGKTSRRPSDSGPPAERSPCRGRVCISDLRIP 
LMWKDTE YFKNKDLHRWAVFLLLQLGEH IQDTEMI LVDRTLTDI SFQSNVLFAEAGPD 
FELRLELYG ACVE EEGALTGG P KRLATKLS S S LGRS SGRRVRAS LD S AGGSGS S P I LL 
PTPWGGPRYHLLAHTTLTLAAVQDGFRTHDLTLASHEENPAWLPLYGSVCCRLAAQP 
LCMTQPTASGTLRVQQAGEMQNWAQVHGVLKGTNLFCYRQPEDADTGEEPLLTIAVNK 
ETRVRAGELDQAliGRPFTLSISNQYGDDEVTHTLQTESREALQSWMEALWQLFFDMSQ 
WKQCCDEIMKIETPAPRKPPQALAKQGSLYHEMAIEPLDDIAAVTDILTQREGARLET 
PPPWLAMFTDQPALPNPCSPASVAPAPDWTHPLPWGRPRTFSLDAVPPDHSPRARSVA 
PL PPQRS PRTRGLCS KGQ P RTWLQS P V 
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Further analysis of the NOV 1 15a protein yielded the following properties shown in 
Table 11 5B. 



Table 115B, Protein Sequence Properties NOVllSa 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1707 probability located in.lysosome (lumen); 0.1000 
probability located in mitochondrial matrix space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV 1 15a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 1 15C. 



Table 115C. Geneseq Results for NOVllSa 


Geneseq 
Identifier 


Protein/Organisoi/Length [Patent 
#, Date] 


NOV115a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB35241 \ 


Human rhotekin - Homo sapiens, 563 
aa. [US6183990-B1, 06-FEB-2001] 


24..549 
37..563 


526/527(99%) j 
526/527(99%) 


0.0 


AAY44559 


Human Rhotekin protein - Homo 
sapiens, 563 aa. [W09958667-A1, 
18-NOV-1999] 


24..549 i 
37..563 


526/527(99%) 
526/527 (99%) 


0.0 


AAB35242 


Human rhotekin EST-derived protein 
• Homo sapiens, 527 aa. 
[US6183990-B1, 06-FEB-2001] 


24..549 
1..527 


522/527 (99%) 
523/527 (99%) 


0.0 


AAY44560 | 


Human Rhotekin variant protein - 
Homo sapiens, 527 aa. [W09958667- 
Al, 18-NOV-1999] 


24..549 1 
1..527 ! 


522/527 (99%) 
523/527 (99%) 


0.0 


AAB26790 ; 


Human Ras correlative GTP binding 
kinase protein sequence - Homo 
sapiens, 544 aa. [CN1 257924- A, 28- 
JUN-2000] • 


24..549 
18..544 


. 518/527(98%) 
519/527 (98%) 


0.0 



In a BLAST search of public sequence databases, the NOV 1 15a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 1 15D. 



Table USD. Public BLASTP Results for NOV115a 
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Protein 
Accession 
Number 


Protein/Organism/Length 


NOVllSa 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


AAH17727 


SIMILAR TO RHOTEKIN - 
Homo sapiens (Human), 550 aa. 


1..549 
1..550 


549/550 (99%) 
549/550 (99%) 


0.0 


Q9BST9 


SIMILAR TO RHOTEKIN - 
Homo sapiens (Human), 587 aa 
(fragment). 


24..549 
61..587 


526/527 (99%) 
526/527 (99%) 


0.0 


Q96PT6 


RTKN - Homo sapiens (Human), 
544 aa. 


24..549 
18..544 


518/527(98%) 
519/527(98%) 


0.0 


Q9HB05 


RHOTEKIN - Homo sapiens 
(Human), 567 aa (fragment). 


24..549 
41..567 


505/527 (95%) 
513/527(96%) 


0.0 


Q61192 


RHOTEKIN - Mus musculus 
(Mouse), 551 aa. 


1..549 
1..551 


477/551 (86%) 
500/551 (90%) 


0.0 



PFam analysis predicts that the NOV 11 5a protein contains the domains shown in the 
Table 11 5E. 



Table USE. Domain Analysis of NOV 115a 


Pf am Domain 


NOVllSa Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


HR1: domain 1 of 1 


23..95 


17/87(20%) 
54/87 (62%) 


0.27 


PH: domain 1 of 1 


296..397 


19/102(19%) 
72/102(71%) 


le-06 



Example 116. 

The NOV 1 16 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 1 16 A. 



Table 116A. NOV116 Sequence Analysis 




SEQ ID NO: 325 


450 bp 


NOV 116a, 

CG59855-01 DNA Sequence 


CTGGGAGACTGAAAAAATGCAGACCACCGGGGTATTACTCATTTCTCCAGCTCTGATC 
TGCTGTTGTACCAGGGGTCTAATCAGG CCTGTGT CTGCCTTCTCCTTGAATAG CC CAG 
AGAATTCATCTAAACAGC CTTC CT ACAGCAGCTC CCCACTCC AGGTGGCCAGACGGGA 
GTTCCAGACCAGTGTTGTCTCCCGGGACACTGACACAGCCGCCAAGTTTATTGGTGCT 
GGGTCAGCCACAGTTGGTGTGG CTGATTCAGGGG CTGGC ATTGG AGCGGTGTTTGGCA 
GCTTGATTATTGTCTATGCCAGGAAGCTGTCTCTCAAGCAGCAACTCCTCTTCTATGC 
CATTCTGGGCTTTGCCCTGTCTGAGGCCATGGGGCTCTTCTGTTTGATGATCTCCTrC 
TTCATCCTGTTCGCCATGTGAGGCTCCGTGAGGGTCACCTGCCT 




ORF Start: ATG at 17 


ORF Stop: TGA at 425 




SEQ ID NO: 326 


136 aa MW at 14384.6kD 
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NOV116a, 

CG59855-01 Protein Sequence 


MQTTGVLLIS PALI CCCTRGLIRPVSAFSLNS PENS SKQPSYSSSPLQVARREFQTSV 
VS RDTDTAAKF I G AGS AT VG VAD SGAG I G AVFG S L 1 1 VYARKLS LKQQ LLFY A I LG FA 
LSEAMGLFCIiMISFFILFAM 




SEQ ID NO: 327 


434 bp 


NOV116b, 

CG59855-02 DNA Sequence 


ATGCAGACCACCGGGGTATTACTCATTTCTCCAGCTCTGATCTGCTGTTGTACCAGGG 
GTCT AATC AGGCCTGTGTCTGC CTTCTCCTTG AATAGCC CAGAGAATTCATCTAAACA 
GCCTTCCTACAGCAGCTCCCCACTCCAGGTGGCCAGACGGGAGTTCCAGACCAGTGTT 
GTCTCCCGGGACACTGACACAGCCGCCAAGTTTATTGGTGCTGGGTCAGCCACAGTTG 
GTGTGGCTGATTCAGAGGCTGG CATTGG AGCGGTGTTTGG C AGCTTG ATTATTGTCTA 
TG CCAGG AAG CTGTCTCTCAAG CAG C AACTCCTCTTCTATGCCATTCTGGGCTTTG CC 
CTGTCTGAGGCCATGGGGCTCTTCTGTTTGATGATCTCCTTCTTCATCCTGTTCGCCA 
TGTGAGGCTCCGTGAGGGTCACCTGCCT 




ORF Start: ATG at 1 


ORF Stop: TGA at 409 




SEQ ID NO: 328 


136 aa MW at 14456.7kD " 


NOV116b, 

CG59855-02 Protein Sequence 


MQTTGVLLISPALICCCTRGLIRPVSAFSLNSPENSSKQPSYSSSPLQVARREFQTSV 
VS RDTDTAAKF I GAGS ATVG VADS EAG I GAVFGS L I IVYARKLSLKQQLLFYAI LGFA 
LSEAMGLFCLMISFFILFAM 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 1 1 6B. 



Table 1 1 6B. Comparison of NOV1 1 6a against NOV1 1 6b. 


Protein Sequence 


NOV1 16a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV116b 


1..136 
1..136 


120/136 (88%) 
120/136 (88%) 



Further analysis of the NOV 1 16a protein yielded the following properties shown in 
Table 116C. 



Table 116C. Protein Sequence Properties NOV116a 


PSort 
analysis: 


0.9190 probability located in plasma membrane; 0.3000 probability located in 
lysosome (membrane); 0.1888 probability located in microbody (peroxisome); 
0.1000 probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


Likely cleavage site between residues 28 and 29 



A search of the NOV1 16a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 1 1 6D. 
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Table 116D. Geneseq Results for NO VI 16a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


Residues/ 

Match 
Residues 


luemiiicv 
Similarities for 
the Matched 
Region 


Expect 
Value 




nunidii cuiun Canucr aiiugcii pruiciii 

SEQ ID NO:5906 - Homo sapiens, 
142 aa. [WO200122920-A2, 05-APR- \ 
2001] 


i 1 16 
7..142 


1 1 S/1 ^fi (RA°/ n \ 

l L Ji 1 JO ^0*r /O ^ 

119/136(86%) 


?p-S7 
j / 




n urn an Cancer associaicu pruicin 
sequence SEQ ID NO: 1 3 1 1 - Homo 
sapiens, 142 aa. [WO200055350-A1, i 
21-SEP-2000] 


1 1 *\f* 
7..142 


1 1J/ 1 JO ^OH /o J 

119/136(86%) 


9p-^7 
zc~ j / 


A ATT6Q71 ^ 


V_,C11 UCalil prOlCCUVC £>Ct|uCIlLC V^lNl- 

00730, protein #1 - Homo sapiens, 
142 aa. [WO200176532-A2, 18-OCT- \ 
2001] . 


7 1 1f\ 

7..142 


98/136(71%) 1 


iC-j\) 


ADJJ 1 Z. V 1 \J ; 


Rumnn ATTP cvntha^f* dihnnit 

lXUlllall All ojrlilllaow oUUUllll 

homologue, SEQ ID NO:2386 - 
Homo sapiens, 187 aa. 
[WO200157188-A2, 09-AUG-2001] 


7 136 
52..187 ; 


85/1 36 ft»2%^ 
98/136(71%) 


2e-36 

JO 


AAB53428 \ 


Human colon cancer antigen protein 
sequence SEQ ID NO:968 - Homo j 
sapiens, 212 aa. [WO200055351-A1, 
21-SEP-2000] 


7..136 
77..212 


85/136(62%) : 
98/136 (71%) j 


2e-36 



In a BLAST search of public sequence databases, the NOV 1 16a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 1 16E. 



Table 116E. Public BLASTP Results for NOV116a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV116a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


P05496 


ATP synthase lipid-binding protein, 
mitochondrial precursor (EC 3.6.1.34) 
(ATP synthase proteolipid PI) (ATPase 
protein 9) (ATPase subunit C) - Homo 
sapiens (Human), 136 aa. 


1.136 
1..136 


115/136(84%) 
119/136(86%) 


9e-57 


P32876 








le-54 
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mitochondrial precursor (EC 3.6.1.34) 
(ATP synthase proteolipid PI) (ATPase 
protein 9) (ATPase subunit C) - Bos 
taurus (Bovine), 136 aa. 


1..136 


117/136 (85%) 




PI 7605 


ATP synthase lipid-binding protein, 
mitochondrial precursor (EC 3 .6. 1 .34) 
(ATP synthase proteolipid PI) (ATPase 
protein 9) (ATPase subunit C) - Ovis 
aries (Sheep), 136 aa. 


1.136 j 
1.136 j 


113/136 (83%) 
117/136 (85%) 


2e-54 


Q9CR84 


ATP SYNTHASE C CHAIN 
ISOFORM 1 (EC 3.6.1.34) (LIPID- 
BINDING PROTEIN) (SUBUNIT C) - 
Mus musculus (Mouse), 136 aa. 


1..136 
1..136 


112/136 (82%) 
117/136 (85%) 


le-53 


P48202 


ATP synthase lipid-binding protein, 
mitochondrial precursor (EC 3.6.1.34) 
(ATP synthase proteolipid PI) (ATPase 
protein 9) (ATPase subunit C) - Mus 
musculus (Mouse), 136 aa. 


1..136 ! 
1..136 j 


112/136 (82%) 
117/136 (85%) 


le-53 



PFam analysis predicts that the NOV1 16a protein contains the domains shown in the 
Table 116F. 



Table 116F. Domain Analysis of NOV! 16a 


Pfam Domain 


NOV1 16a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


ATP-synt C: domain 1 of 
1 


67.. 135 


31/70(44%) 
57/70(81%) 


2.3e-18 



Example 117. 

The NOV1 17 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 1 17A. 



Table 117A. NOV117 Sequence Analysis 




SEQ ID NO: 329 


1769 bp 


NOV117a, 

CG59807-01 DNA Sequence 


GAGGTCATGCTGGAGACCreCGGACTTCTCATGTCTCTGGGCTGTCCTTTGTTCAAAC 
CAGAGCTGATCTACCAGTTGGATCACAGACAGGAGCTATGGATGGCTACAAAAGACCT 
CTCCCAAAGCTCCTATCC AGGTG ACAACACAAAACCCAAGAC CACAGAGCCT AC CTTT 
TCTCACCTGGCCTTGCCTGAGGAAGTCTTACTCCAGGAACAACTGACACAAGGAGCCT 
CAAAGAACTC CCAATT AGGG CAAT CCAAGGAT C AGG ATGGG C CATCTG AAATG CAAG A 
AGTCCACTTGAAAATAGGG ATAGG CCCCCAGCGGGGGAAGCTG CTGGAGAAAATGAGT 
TCTGAACX5TGATGGTTTGGGGTCAGATGATGGTGTATGTACAAAGATTACACAGAAAC 
AAG TTT CAACAG AAGGTG AT CT CT ATG AATG TG ATT CACATGG ACCAGTT ACAG ATG C 
CTTC ATT CG CG AAG AG AAAAATTC CT AT AAATGTG AGG AATG CGGG AAAG TG TTT AAA 
AAGAATGCCCTCCTTGTTCAGCATGAACGGATTCACACTCAAGTGAAGCCCTATGAAT 
GCACAGAGTGTGGGAAAACCTTTAGC^GAGCACTCATCTTCTTCAGCACCTCATCAT 
C CAC ACTGGGG AGAAG CC CT AT AAGTG C ATGG AG T G TGGGAAGG CTTTT AAC CG CAGG 
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TCACACCTCACACGGCACCAGCGGATTCACAGTGGAGAGAAGCCTTATAAGTGCAGTG 
AATGTGGAAAGGCCTTCACCCACCGCTCCACTTTTGTCTTGCATCACAGGAGCCACAC 
TGG AGAAAAACCCTTTGTGTGCAAAG AGTGTGG CAAAG CCTTTCGAGATAGG CCAGGT 
TTC ATTCGACACT ACATCAT CCACACGGGAG AG AAG CC CTATGAGTGCATTG AGTGTG 
GG AAGGCCTT CAACCGCCGGTCATACCTCACGTGGCAC CAACAGATTCACACTGGAGT 
GAAACCCTTTGAATGCAACG AGTGTGGAAAAG CTTTTTGCGAGAGTGC AGACCT CATT 
CAACACTACATTATCCACACTGGGGAGAAGCCCTATAAGTGCATGGAGTGTGGGAAGG 
CGTTCAACCGTAGGTCACACCTCAAG CAGC AT CAACGG ATTCACACTGGGGAGAAGCC 
TTATGAATGCAGTGAATGTGGAAAGGCCTTCACCCACTGCTCCACTTTTGTCTTGCAT 
AAAAGGACCCACACAGGAGAAAAACCCTATGAATGCAAAGAATGTGGAAAAGCCTTTA 
GTGATAGGGCAGACCTCATTCGCCACTTCAGCATCCACACTGGAGAGAAACCCTATGA 
GTGCGTGGAGTGTGGAAAGGCCTTCAACCGCAGCTCACACCTCACGAGGCACCAACAG 
ATTCACACTGGAGAGAAACCCTATGAATGCATCCAGTGTGGGAAAGCCTTTTGCCGGA 
GCGCAAACCTTATTCGACACTCCATCATTCACACTGGAGAGAAGCCGTATGAATGCAG 
TGAGTGTGGAAAGGCTTTTAATCGCGGCTCATCCCTCACACATCATCAAAGGATTCAT 
ACTGGGAGAAACCCTACCATTGTAACAGATGTGGGAAGACCTTTTATGACTGCACAGA 
CTTCAGTCAACATCCAGGAACTTTTATTAGGGAAAGAGTTTTTGAATATCACCACTGA 
AGAAAATCTGTGGTGAAAGGGAACATCTTACCATCTGGCCATTCACACTGAAGAGAAA 


CTTCATAAGCATCCTCTCTTTGAGAAAAC 




ORF Start: ATG at 7 


ORF Stop: TGAat 1696 




SEQ ID NO: 330 


563 aa 


MW at 64300.6kD 


NOV117a, 

CG59807-01 Protein Sequence 


MLETCGLLMSLGCPLFKPELIYQLDHRQELWMATKDLSQSSYPGDNTKPKTTEPTFSH 
LALPEEVLLQEQLTQGASKNSQLGQSKDQDGPSEMQEVHLKIGIGPQRGKLLEKMSSE 
RDGLGSDDGVCTKITQKQVSTEGDL.YECDSHGPVTDALIREEKNSYKCEECGKVFKKN 
ALL VQHER I HTQ VKP YECTECGKTF S KSTHLLQH LI I HTGE KP YKCMECGKAFNRRS H 
LTRHQRIHSGEKPYKCSECGKAFTHRSTFVLHHRSHTGEKPFVCKECGKAFRDRPGFI 
RHYI I HTGEKPYECI ECGKAFNRRS YLTWHQQI HTGVKPFECNECGKAFCES ADLIQH 
YI I HTG E K P YKCMECG KAFNRRSH LKQHQR I HTG EKP YEC S E CGKAFTHCST FVLH KR 
THTGEKPYECKECGKAFSDRADLIRHFSIHTGEKPYECVECGKAFNRSSHLTRHQQIH 
TG EKP YEC I QCGKAFCRS ANLI RH S 1 1 HTGE KP YECS E CGKAFNRG S S LTHHQR I HTG 
RNPTIVTDVGRPFMTAQTSVNIQELLLGKEFLNITTEENLW 



Further analysis of the NOV 1 17a protein yielded the following properties shown in 
Table 117B. 



Table 117B. Protein Sequence Properties NOV117a 


Psort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


Likely cleavage site between residues 19 and 20 



A search of the NOV1 17a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 1 17C. 
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Table 1 1 7C. Geneseq Results for NOV1 1 7a 


Geneseq 
Identifier : 


Protein/Organism/Length [Patent #, 
Date] 


NOV117a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM79549 \ 


Human protein SEQ ID NO 3 1 95 - 
Homo sapiens, 603 aa. 
[WO200157190-A2, 09-AUG-2001] | 


1..563 
38..603 


563/566(99%) 
563/566 (99%) 


0.0 


AAM78565 j 


Human protein SEQ ED NO 1227 - 
Homo sapiens, 603 aa. 
[WO200157190-A2, 09-AUG-2001] 


1..563 ! 
38..603 \ 


563/566 (99%) 
563/566 (99%) 


0.0 


ARR21767 


PrAtf*in it'\'lf\f% f*nf*nHpH Hv nrnbp fnr 

X Ivslwlll 7T_? / wVJ UlvvUvU U V LflUUw 1U1 

measuring heart cell gene expression - 
Homo sapiens, 551 aa. 
[WO200157274-A2, 09-AUG-2001] 


44 562 
10..527 


375/519 (72%) 
437/519(83%) 


0 0 


AAM69575 1 


Human bone marrow exoressed nrobe 
encoded protein SEQ ID NO: 29881 - 
Homo sapiens, 551 aa. 
[WO200157276-A2, 09-AUG-2001] 


44.. 562 
10..527 


375/519(72%) 
437/519(83%) 


0.0 


AAM57172 1 


Human brain expressed single exon 
probe encoded protein SEQ ID NO: 
29277 - Homo sapiens, 551 aa. 
[WO200157275-A2, 09-AUG-2001] 


44..562 
10..527 ■ 


375/519(72%) 
437/519(83%) 


0.0 



In a BLAST search of public sequence databases, the NOV1 17a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 1 17D. 



Table 117D. Public BLASTP Results for NOV117a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV117a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


043296 


Zinc finger protein 264 - Homo 
sapiens (Human), 627 aa. 


1..562 
43..603 


401/562 (71%) 
468/562 (82%) 


0.0 


Q96NL3 


CDNA FLJ30663 FIS, CLONE 
FCBBF1 000598, MODERATELY 
SIMILAR TO ZINC FINGER 
PROTEIN 84 - Homo sapiens 
(Human), 588 aa. 


1..535 
38..572 


299/535 (55%) 
369/535 (68%) 


0.0 


Q99676 








e-151 
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sapiens (Human), 751 aa. 


58..595 


355/542 (65%) 




Q96SE7 


ZINC FINGER 1 1 1 1 - Homo sapiens 
(Human), 839 aa. 


151..541 
306..694 


233/391 (59%) 
281/391 (71%) 


e-148 


Q03923 ; 


Zinc finger protein 85 (Zinc finger 
protein HPF4) (HTF1) - Homo sapiens 
(Human), 595 aa. 


1..535 
33..547 


266/544(48%) 
328/544(59%) 


e-148 



PFam analysis predicts that the NOV 11 7a protein contains the domains shown in the 
Table 11 7E. 



Table 117E. Domain Analysis of NOV117a 


Pfam Domain 


NOV1 17a Match 
ivcgion 


Identities/ 
Similarities 
ior inc lviaicncCi 
Region 


Expect 
Value 


IvxVrV-D. UO 111 dill 1 Ol 1 


1 *\A 


1 Alf\& (0 1 o/ A \ 
l*t/00 \Z. 1 /oj 

24/66 (36%) 


n i < 


7f-r , ?H9* rlrvmnin 1 nf H 


1 60 1 RA 


1 1 /0d (Af\%\ 

19/24 (79%) 


j.OC-VO 


zi-Lzriz, aomain z 01 io 


i on oi o 
iyu..ziz 


1 1 /OA (A£S>/\ 

19/24 (79%) 


/.ie-uo 


ici-^znz. Quiriain j 01 ij 


oift oAn i 

Zl O..Z4U 


1*1/ Z4 {JO /o) 

22/24(92%) 


t la m 

z.je-u/ 


zf-RFD- domain 1 nf 1 

ui ±j jl*>ls , ui/iiiaiii 1 J 


20^ 241 


1"V52 ^5%^ 
25/52 (48%) 




zf-C2H2:domain4ofl3 


246.-268 


11/24(46%) 
20/24 (83%) 


4.6e-05 


LIM: domain 1 of 1 


220..284 


16/72 (22%) 
50/72 (69%) 


0.69 


zf-C2H2: domain 5 of 13 


274..296 


8/24 (33%) 
18/24 (75%) 


7.6e-05 


zf-C2H2: domain 6 of 13 


302..324 


11/24(46%) 
20/24 (83%) 


8.4e-05 


Zn carbOpept: domain 1 of 
1 


312..330 


5/19 (26%) 
17/19(89%) 


1.2 


zf-C2H2: domain 7 of 13 


330..352 


8/24 (33%) 
19/24(79%) 


9.7e-05 


zf-C2H2: domain 8 of 13 


358..380 


14/24 (58%) 
22/24 (92%) 


5.3e-07 
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zf-BED: domain 2 of 3 


343..381 


12/52 (23%) 
26/52 (50%) 


1.3 


zf-C2H2: domain 9 of 13 


386..408 


1 1/24 (46%) 
20/24 (83%) 


9.4e-05 


zf-C2H2: domain 10 of 13 


414.436 


11/24(46%) 
20/24 (83%) 


5e-06 


zf-C2H2: domain 11 of 13 


442.. 464 


12/24(50%) 
22/24 (92%) 


3e-07 


zf-BED: domain 3 of 3 


427..465 


14/52 (27%) 
27/52 (52%) 


0.38 


zf-C2H2: domain 12 of 13 


470..492 


12/24 (50%) 
19/24 (79%) 


0.00044 


zf-C2H2: domain 13 of 13 


498..520 


12/24 (50%) 
22/24 (92%) 


9.8e-07 



Example 118. 

The NOV1 18 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 1 18 A. 



Table 1 1 8A. NOV1 18 Sequence Analysis 




SEQIDNO:331 


1899 bp 


NOV118a, 

CG59805-O1 DNA Sequence 


CAAACTCTACTACCTCTATATQACATT7CAGGTCTCTGTGACCTTTGATGATGTGGCT 
GTGACTTTCACCCAGGAGGAGTGGGGCCAGCTGGACCTAGCTCAGCGGACCCTGTACC 
AGGAGGTGATGCTGGAAAACTGTGGGCTCCTGGTATCTCTGGGTGGGTGTCCTGTTCC 
CAGACCTG AGCTG AT CT ACCACCTAGAGCATGGG CAGGAGCCATGGAC CAGG AAGG AA 
GACCTCTCCCAAGGCACCTGTCCAGGTGACAAAGGAAAACCCAAGAGCACAGAACCTA 
CCACCrGTGAGCTAGCCTTGTCTGAAGGAATCTCTTTTTGGGGACAACTAACACAAGG 
AG CTTCAGGGGACTCCCAGTTGGGGCAACCC AAGGAT CAGG ATGGGTTTTCAGAAATG 
CAGGGAGAACGCTTGAGACCAGGGTTAGATTCCCAAAAGGAGAAGCTTCCTGGAAAAA 
TGAGCCCCAAACATGATCGTTTAGGGACAGCTGATAGTGTGTGTTCAAGGATTATACA 
GGATCGAGTCTCCTTAGGAGATGATGTCCATGACTGTGACTCACATGGATCAGGTAAA 
AATCCAGTTATTCAGGAAGAGGAAAATATCTTTAAATGCAATGAATGTGAAAAAGTGT 
TTAACAAGAAACGCCTGCTTGCTCGGCATG AG AGGAT T CACTCTGGAGTGAAGCCCTA 
TGAATCCACAGAGTGTGGAAAAACCTTTAGCAAGAGTACATACCTCCTGCAGCACCAC 
ATGGTCCACACI^»GGGAGAAGCCCTATAAGTGCATX3GAGTGTX^GAAGGCTTTTAATC 
GG AAGT CACACCTTACCCAGCACCAGCGGATT C ACAGTGG AG AG AAGCCTTATAAGTG 
CAGTGAATGTGGAAAGGCCTTCACCCACCGCTCCACTTTTGTCTTGCATAACAGGAGC 
CACACT GG AG AAAAAC C CTTTGTG TG CAAAG AG TGTGG CAAAG C CTTT CG AG AT AGG C 
CAGGTTTCATTCGACACTACATCATCCACAGTGGTGAGAATCCCTACGAGTGCTTCGA 
ATGTGGCAAGGTCTTCAAACACAG AT CATAC CTCATGTGGC ACCAGCAG ACTCAT ACC 
GGGGAGAAGCCCTATGAGTGCAGTGAATGTGGGAAGGCCTTCTGTGAGAGCGCAGCGC 
TGATTCACCACTATGTCATCCACACTGGAGAGAAGCCCTTTGAGTGCCTCGAGTGTGG 
GAAGGCTTTCAACCACCGATCCTACCTCAAAAGGCACCAGCGGATTCACACTGGGGAG 
AAGCCATATGTCTGTAGTGAATGCGGAAAGGCCTTCACCCACTGCTCTACTTTCATCT 
TGCATAAAAGGGCCCACACTGGAGAAAAACCTTTCGAGTGCAAAGAGTGTGGGAAAGC 
CTTTAGCAATAGGGCAGACCTCATTCGCCACTTCAGCATCCACACTGGAGAGAAGCCC 
TATGAGTGCATGGAGTGTGGAAAGGCCTTCAACCGCAGGTCAGGCCTCACAAGGCACC 
AGCGGATTCATAGTGGAGAGAAGCCCTATCAATCCATCGAGTGTGGGAAAACATTTTG 
CTGGAGCACAAACCTCATTCGACACTCTATCATCCACACTGGAGAGAAGCCGTATGAG 
TG CAGTG AATGTGG AAAG G CCTTC AG TCG C AG CT CGT CC CT C AC T C AG CAT CAAAG G A 
TGCATACTG^GAGAAATCCTATCAGTGTAACAGATGTGGGAAGACCTTTTACAAGTGG 
GCAGACCTCAGTCAACATCC^GAACTTTTATTGGGGAAAAACTTTTTGAATGTCACC 
ACTGAGGAAAATCTTTTGCAAGAGGAAGCATCTTACATGGCATCTGATCGTACATACC 
AAAGAGAAACCCCACAAGTGTCTTCACTGTGAGAAAACCTTCT 




ORF Start: ATG at 20 


ORF Stop: TGAat 1886 
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SEQ ID NO: 332 


622 aa 


MW at 70677.2kD 


NOV118a, 

CG59805-01 Protein Sequence 


MTFQVSVTFDDVAVTFTQEEWGQLDLAQRTLYQEVMLENCGLLVSLGGCPVPRPELIY 
HLEHGQE PWTRKEDLSQGTCPGDKGKPKSTE PTTCELALS EG I S FWGQLTQGASGDSQ 
LGQPKDQDGFSEMQGERLRPGLDSQKEKLPGKMSPKHDGLGTADSVCSRIIQDRVSLG 
DDVHDCDSHGSGKNPVIQEEENIFKCNECEKVFNKKRLLARHERIHSGVKPYECTECG 
KT FS KSTYLLQHHMVHTG E KP YKCMECG KAFNRKSH LTQHQR I H SGEKP YKCS E CG KA 
FTHRSTFVLHNRSHTGEKPFVCKECGKAFRDRPGFIRHYIIHSGENPYECFECGKVFK 
HRSYLMWHQQTHTGEKPYECSECGKAFCESAALIHHYVIHTGEKPFECLECGKAFNHR 
SYLKRHQRIHTGEKPYVCSECGKAFTHCSTFILHKRAHTGEKPFECKECGKAFSNRAD 
L I RH FS I HTG E KP YECME CG KAFNRRS G LTRHQR IHSGEKPYECIECG KT FCWSTNLI 
RHSIIHTGEKPYECSECGKAFSRSSSLTQHQRMHTGRNPISVTDVGRPFTSGQTSVNI 
QELLLGKNFLNVTTEENLLQEEAS YMASDRT YQRET PQVS S L 



Further analysis of the NOV 1 18a protein yielded the following properties shown in 
Table 11 8B. 



Table 1 18B. Protein Sequence Properties NOV118a 


PSort | 
analysis: 


0.4500 probability located in cytoplasm; 0.3796 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV1 1 8a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 1 1 8C. 



Table 118C. Geneseq Results for NO VI 18a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV118a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


ABB22693 , 


Protein #4692 encoded by probe for 
measuring heart cell gene expression 
- Homo sapiens, 468 aa. 
[WO2001 57274- A2, 09-AUG-2001] 


81..548 
1..468 


468/468 (100%) 
468/468 (100%) 


0.0 


AAM70526 j 


Human bone marrow expressed probe 
encoded protein SEQ ID NO: 30832 - 
Homo sapiens, 468 aa. 
[WO200157276-A2, 09-AUG-2001] 


81..548 
1..468 | 


468/468 (100%) 
468/468 (100%) 


0.0 


AAM58080 


Human brain expressed single exon 
probe encoded protein SEQ ID NO: 
30185 - Homo sapiens, 468 aa. 
[WO200157275-A2, 09-AUG-2001] 


81..548 
1..468 


468/468 (100%) 
468/468 (100%) 


0.0 


AAM30843 








0.0 
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measuring placental gene expression 
- Homo sapiens, 468 aa. 
[WO200157272-A2, 09-AUG-2001] 


L.468 


468/468 (100%) 




AAM18364 


Peptide #4798 encoded by probe for 
measuring cervical gene expression - 
Homo sapiens, 468 aa. 
[WO200157278-A2, 09-AUG-2001] 


81. .548 
1..468 


468/468 (100%) 
468/468 (100%) 


0.0 


In a BLAST search of public sequence databases, the NOV 1 18a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 1 18D. 


Table 118D. Public BLASTP Results for NOV118a 


x ruicin 
Accession 
Number \ 


Protein/Organism/Length 


NOV118a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


043296 


Zinc finger protein 264 - Homo 
sapiens (Human), 627 aa. 


4. .622 
11. .627 


530/619 (85%) 
567/619 (90%) 


0.0 


Q96NL3 


CDNA FLJ30663 FIS, CLONE 
FCBBF1000598, MODERATELY 
SIMILAR TO ZINC FINGER 
PROTEIN 84 - Homo sapiens 
(Human), 588 aa. 


7..572 
9..573 


334/566 (59%) 
403/566(71%) 


0.0 


Q99676 | 


Zinc finger protein 184 - Homo 
sapiens (Human), 751 aa. 


2..571 
23..623 


280/604 (46%) 
377/604 (62%) 


e-160 


P51523 ] 


Zinc finger protein 84 (Zinc finger 
protein HPF2) - Homo sapiens 
(Human), 738 aa. 


4..617 
5..626 


286/637 (44%) 
368/637 (56%) 


e-157 


Q9BX82 


EZFIT-RELATED PROTEIN 1 - 
Homo sapiens (Human), 626 aa. 


7..617 
14..626 


278/621 (44%) 
364/621 (57%) 


e-156 



PFam analysis predicts that the NOV 1 18a protein contains the domains shown in the 
Table 11 8E. 
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Table 118E. Domain Analysis of NO VI 18a 


Pfam Domain 


NOV118a Match 

* Rpaj An 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


TTRATV domain 1 of 1 


7 70 


41/66 f62%) 
54/66 (82%) 


2.2e-33 


7f-C2H2* domain 1 of 13 


198 220 

1 S\Jm ***** V 


1 1/24 (46%) 
17/24(71%) 


3.9e-05 


Rol A • domain 1 of 1 

DUl/l. UUlIlulIl 1 1 


161 238 


14/88 (16%) 
49/88 (56%) 


3.4 


7f-C2H2* domain 2 of 13 


226 248 


10/24(42%) 
18/24 (75%) 


6.2e-05 


7f-C2H2- domain 3 of 13 


254 276 


14/24 (58%) 
22/24 (92%) 


5e-07 


TFTTS* domain 1 of 1 


2S7..292 


12/39(31%) 
21/39 (54%) 


5.7 


7f-C2H2- domain 4 of 13 


282 304 


1 1/24 (46%) 
20/24 (83%) 


3.7e-05 


T TAA' domain 1 of 1 

JL/ilVl . UUlllCLlll 1 Ul 1 


2S6 320 


14/71 (20%) 
48/71 (68%) 


0.38 


7f-r2H2* domain 5 of 13 


310 332 


8/24 (33%) 
18/24 (75%) 


7.6e-05 


7f.r?N9- domain 6 of 1 3 

ZjI v^^Jri^ . UUIUCllll U vJL u 


338 360 


1 1/24 (46%) 
19/24 (79%) 


l.le-05 


7f-P2H2- domain 7 of 1 3 

£1 XXIX. UUIIlalll / Ul 1 


366 388 


9/24 (38%) 
18/24 (75%) 


0 00027 


7f-r?H2- domain 8 of 1 3 


394 416 


12/24 (50%) 
21/24 (88%) 


7.9e-07 


7f-r2H2- domain 9 of 13 


422 444 

~*mX* . ."Ill 


10/24 (42%) 

l \JI \ rVJ 

19/24 (79%) 


0 00014 


7f-P2H2* domain 10 of 
vLfu*. U-Uiiiaiii iv ui 

13 


450 472 


10/24 (42%) 

1 yjt \^*m / U y 

20/24 (83%) 


8.3e-06 


zf-C2H2' domain 11 of 
13 


478..500 


13/24 (54%) 
21/24 (88%) 


3e-07 


zf-BED: domain 1 of 1 


463..501 


14/52 (27%) 
29/52 (56%) 


0.1 


zf-C2H2: domain 12 of 
13 


506..528 


11/24 (46%) 
17/24 (71%) 


0.0016 
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zf-C2H2: domain 13 of 


534..S56 


13/24(54%) 


7.2e-08 


13 




23/24(96%) . 





Example 119. 

The NOV1 19 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 1 19A. 



Table 119A. NOV119 Sequence Analysis 




SEQIDNO:333 


1546 bp 


NOV119a, 

CG59928-01 DNA Sequence 


GCTCAGTAGGCGTCGGGCTGTGATGCCCCAACTGCTCCAGCGTCTGCAGGCGCGCGCG 


GGCGCGGTAGGCGTACTCGCTGGCCGGATAGCGCGTGATGATGAACTGGTAGGTCTGC 


GCCGCATCGACGAACAGGCTCTCGCGCTCCAGGCATTGACCGCGCAGCAGGGAAATCT 


Q,Q.Gv>L 1 (jLAGG TAAT rCaCQj rGAQ»CGC»C_TC_T rW-GCTCGGCCTGCGACAGCTT^ 


GACACGGGCGCAATCGCCTT CGTTGTAGGCG CGATAGGCGTTGTTCAGATGATGGTCG 


AGCG AGACACGGGTGCAACC CGCAGCAACCAGGG CCACGGCCAG AATGATCAGGTTAC 


G CATGGGCAATT CCTCCAATGAGC AGTGTAT CG ACAGCCCAGGCAAAAACTG AAC AG C 


GGCAAGCCGACGACGGTTTTTCTGGCGG CGC CTTGGC ATGACGC CACTGCCTCTCATT 


TTATCAACGCCAGCGCCACGACCGCTCGTCCTCTCGAACCAGCGCTAAATCCCCTTCT 


GCGCTGACCCATATCAATGCCGTTCAGCGCAACAGGGTGTGTAATGTAGGTACAGACT 


CCAGGCGAGGACGCTGCCATOAAACTGCAACGACTGTTGGTCGTCATCGACGCCGAAC 
ACCAGCAACAACCCG CCCTGCAACGCGC AG CCGATGTGGCACGC AAGACCGGCG CCG A 
ATTGCACCTGTTGCAGATCGAATACCACCCAAGCCTGGAAAGCGGCCTGCTGGACAGC 
CATCTGCTCAACCGCGCCCGTGAAACCATCCTGCGACAGAGCCACGAGGCCCTGCGTG 
CCAGCGTCGCTCACCTGAGCGATGAAGGATTCAAGATCGCAGTGGACGTGCGCTGGGG 
CAAACGTCGTCATGAAGAAATCCTCGCCCGCGTCGCGGTGTTGCAACCGGACATCCTG 
TTCAAGTCGACTCATCCCAGCAGTGCGCTGCGCCGCCTGTTGTTCAGTGATACCAGTT 
GGCAGCTGATTCGCCGCAGCCCGGTGCCGCTGTGGCTGGTACACGACGCCGAGCCCCA 
TGGTCAGAGCCTGTGCGCTGC<3CTCGACCCGCTGCACAGCGCGGACAAACCTGCCGCC 
CTCGATCAT CAGTTGATTG ATGCCAG CCAG ACC CTGC AGG CCGAGCTCGGCTTACAGG 
CCCIAATACCTGCATGCACAGGCGCCTCTGCCGCGGTCGCTGCTGTTCGACGCCGAGGT 
AGCGCAGGAATATGAAGACTACGTGACCCAGTGCAGCCGCGAGCACCGCGAAGCCTTC 
GACAAGCTGATCGCCCAGCACGCCATCGATAGAGCACAGGCCCACCTGTTGGACGGTT 
TTGCCG AGG AAGTCATCCCGCGTTTCGTGCGTGAG CACAATATAGG CCTGCTGGTG AT 
GGGCGCCATCGCCCGCGGCCATCTGGACAGCCTGCTGATCGGCCACACCGCAGAACGG 
GTGCTGG AACGTGTCGAGTGCG AT CTGCTGGTGAT C AAATCG CACGGCAAAGGGT AGT 
G CACAGG AACAATGACTACAGC CCGACGCCTACTG AGC 




ORF Start: ATG at 599 


ORF Stop: TAG at 1505 




SEQ ID NO: 334 


302 aa MW at 33922.3kD 


NOV119a, 

CG59928-01 Protein Sequence 


MKLQRLLVVIDAEHQQQPALQRAADVARKTGAELHLLQIEYHPSLESGLLDSHLLNRA 
RETI LRQSHEALRAS VAHLSDEGFK I AVDVRWGKRRHEEI LARVAVLQPD I LFKSTHP 
S S ALRRLL FS DTS WQL I RRS P V PLWLVHDAE PHGQS LCAALD PLH S ADKP AALDHQL I 
D ASQT LQAELGLQ AQ YLiHAQAP L PRS LL FD AEVAQE YE DYVTQC SREHREAFD KL I AQ 
HA I D RAQ AH LLDG F AE E V I P RFVREHN I G LL VMGA I ARG H LD S LL I GHT AE R VL E R VE 
CDLLVI KSHGKG 



Further analysis of the NOV1 19a protein yielded the following properties shown in 
Table 119B. 



Table 119B. Protein Sequence Properties NOV119a 


PSort 
analysis: 


0.3000 probability located in microbody (peroxisome); 0.3000 probability 
located in nucleus; 0.2014 probability located in lysosome (lumen); 0.1000 
probability located in mitochondrial matrix space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV 1 19a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 1 19C. 



Table 119C. Geneseq Results for NOV119a 






NOV119a 


Identities/ 




Geneseq 


Protein/Organism/Length 


Residues/ 


Similarities for 


Expect 


Identifier 


[Patent #, Date] 


Match 


the Matched 


Value 






Residues 


Region 




No Significant Matches Found 



In a BLAST search of public sequence databases, the NOV1 19a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 1 19D. 



Table 119D. Public BLASTP Results for NOV119a 


Protein 
Accession 
Number 


Protein/Organism/Length { 


NOV119a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9HW73 


HYPOTHETICAL PROTEIN 
PA4328 - Pseudomonas aeruginosa, i 
304 aa. 


1.297 
1..299 


156/299 (52%) 
200/299 (66%) 


le-79 


Q9KS28 


HYPOTHETICAL PROTEIN 
VC1433 - Vibrio cholerae, 315 aa. 


5..300 
6..304 


78/302 (25%) : 
147/302 (47%) 


4e-29 


CAC91106 


PUTATIVE STRESS PROTEIN - 
Yersinia pestis, 3 1 8 aa. 


2..300 
3..303 


93/310(30%) 
137/310(44%) 


2e-28 


AAL20579 


PUTATIVE UNIVERSAL STRESS j 
PROTEIN - Salmonella 
typhimurium LT2, 315 aa. 


4.. 297 
5..300 


91/305 (29%) 
139/305 (44%) 


2e-28 


CAD01669 


CONSERVED HYPOTHETICAL 
PROTEIN - Salmonella enterica 
subsp. enterica serovar Typhi, 315 
aa. 


4..297 
5..300 


91/305 (29%) 
139/305 (44%) 


3e-28 



PFam analysis predicts that the NOV1 19a protein contains the domains shown in the 
Table 119E. 
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Table 119E. Domain Analysis of NO VI 19a 


Pfam Domain 


NOV119a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


Usp: domain 1 of 2 


2.. 144 


28/153 (18%) 
92/153 (60%) 


0.0014 


Usp: domain 2 of 2 


160..297 


28/153 (18%) 
88/153 (58%) 


0.013 



Example 120. 

The NOV 120 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 120 A. 



Table 120A. NOV120 Sequence Analysis 




SEQ ID NO: 335 


2202 bp 


NOV 120a 

11V/ » 1 i*\JCk.1y 

CG59947-01 DNA Sequence 


CACCCTCCCGCCCCGCCCCCCGTCCAATGCTGAGCTCAGTCTGCGTCTCGTCCTTCCG 
CGGGCG CCAGGGGGCCAG CAAGCAGCAGCCGG CG CCACCGCCGCAGCCGCC CGAGGTC 
CCCGGTGGCGACAGCGGCAAGATCGTGATCAACGTGGGCGGCGTGCGCCATGAGACGT 
ACCGCTCGACGCTGCGCACCCTGCCGGGGACGCGGCTGGCCGGCCTGACGGAGCCCGA 
GGCGGCGGCACGCTTCGACTACGACCCXKX3CGCCGACGAGTTCTTCTTTGACCGGCAC 
CCGGGAGTCTTCGCGTACGTGCTCAACTACTACCGCACCGGCAAGCTGCACTGCCCAG 
CCGACGTGTCCGGGCCCCTCTTTGAGGAGGAGCTCGGCTTCTGGGGCATCGACGAGAC 
CGACGTGGAGGC CTGCTG CTGGATGACCTACCGG CAGCATCG CGACG CTG AGGAGGCG 
CTCGACTCCTTCGAGGCGCCCGACCCCGCGGGCGCCGCCAACGCCGCCAACGCCGCAG 
GCGCCCACGACGGAGGCCTGGACGACGAGGCGGGCGCGGGCGGCGGCGGCCTGGACGG 
AGCGGGCGGCGAGCTCAAGCGCCTCTGCTTCCAGGACGCGGGCGGCGGCGCCGGGGGG 
C CG CCAGGGGGCGCGGGCGGCGCGGGCGGCACATGGTGGCGCCG CTGGCAG CCCCGCG 
TGTGGGCGCTCTTCGAGGACCCCTACTCGTCGCGGGCTGCCAGGTATGTGGCCTTCGC 
CTCCCTCTTCTTCATCCTCATCTCCATCACCACCTTCTGCCTGGAAACCCATGAGGGC 
TTCATCCATATTAGCAACAAGACGGTGACCCAGGCCTCCCCGATCCCCGGGGCACCTC 
CGGAGAACATCACCAACGTGGAGGTGGAGACGGAGCCCTTCCTGACCTACGTGGAGGG 
GGTGTGCGTCGTCrGGTTCACCTTCGAGTTCCTCATGCGCATCACCTTCTGCCCAGAC 
AAGGTGGAGTTTCTTAAAAGCAGCCTCAACATCATCGACTGTGTGGCCATCCTGCCCT 
TCTATCTCGAGGTGGGCCTCTCGGGCCTCAGCTCCAAGGCCGCCAAAGACGTGCTGGG 
CTTCCTGCGGGTGGTCCGCTTCGTCCGCATCCTGCGCATCTTCAAGCTGACCCGGCAC 
TTCGTGGGGCTGCGCGTGCTGGGACACACGCTCCGCGCCAGCACCAACGAGTTCCTGC 
TGCTCATCATCTTCCTGGCCCTGGGGGT<3CTCATCTTCGCCACCATGATTTACTACGC 
TX3AGCGCATTGGCGCCGACCCCGATGACATCCTGGGCTCCAACCACACCTACTTCAAG 
AAC AT CCCCATTGGCTTCTGGTGGG CTGTGGTCACCATG ACGACCCTGGGCT ATGGAG 
ACATGTACCCCAAGACGTGGTCGGGGATGCTGGTCGGGGCGCTGTGTGCCCTGGCGGG 
GGTCCTGACCATCGCCATGCCTGTGCCCGTCATTGTCAACAACTTTGGCATGTACTAT 
TCGCTGGCCATGGCCAAGCAGAAGCTGCCCAAGAAGAAGAACAAACACATCCCCCGGC 
CCCCGCAACCGGGCTCGCCCAACTACTGCAAGCCTGACCCACCCCCGCCACCCCCGCC 
CCACCCGCACCACGGCAGCGGGGGCATCAGCCCGCCGCCACCCATCACCCCACCCTCC 
ATGGGGGTGACTGTGGCCGGGGCCTACCCAGCGGGGCCCCACACGCACCCCGGGCTGC 
TCAGGGGGGGAGCX5GGTGGGCTGGGGATCATGGGGCTGCCTCCTCTGCCAGCCCCCGG 
CGAGCCTTGCCCGTTGGCTCAGGAGGAGGTGATTGAGATCAACCGGGCAGATCCTCGC 
CCCAATCGGGATCCGGCAGCAGCTCCGCTTGCCCACGAGGACTGCCCAGCCATTGACC 
AGCCTGCCATGTCCCCGGAAGACAAGAGCCCCATCACGCCTGGAAGCCGTGGCCGCTA 
TAGCCGGGACCGAGCCTGCTTCCTCCTCACCGACTATGCCCCTTCCCCTGATGGCTCC 
ATCCGAAAAGCCACTGGTGCTCCCCCACTGCCCCCCCAAGACTGGCGTAAGCCAGGCC 
CCCCAAGCTTCTTGCCCGACCTCAACGCCAACGCCGCGGCCTGGATATCCCCCTAGTG 
GACGAACCCCCTCCCCCCGGGCTCTTGTCACCGCCTGAGACCTCGCGAGACTTTCG 




ORF Start: ATG at 27 


ORF Stop: TAG at 2142 




SEQ ID NO: 336 


705 aa 


MW at 75590.5kD 


NOV 120a, 


MLSSVCVSSFRGRQGASKQQPAPPPQPPEVPGGDSGKIVINVGGVRHETYRSTLRTLP 
GTRIAGLTEPEAAARFDYDPGADEFFFDRHPGVFAYVLNYYRTGKLHCPADVCGPI.FE 
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CG59947-01 Protein Sequence 



EELGFWGI DETDVEACCWMTYRQHRDAEEALDS FEAPDPAGAANAANAAGAHDGGLDD 
EAGAGGGG LDGAGG E LKRLC FQDAGGG AGG P PGG AGG AGGTWWRRWQ PR VWAL FED P Y 
SSRAARYVAFASLFFI LI SITTFCLETHEGFIHI SNKTVTQAS PI PGAPPENITNVEV 
ETE PFLTYVEGVCWWFTFE FLMRI TFC PDKVEFLKSSLN 1 1 DCVAI LPFYLEVGLSG 
LSSKAAKDVIXSFI^VWFVRILRIFKLTRHFVGLRVLGHTLRASTNEFLLLIIFLA^ 
VL I FATM I YYAERIGAD PDDILGSNHTYFKNI P IGFWWAWTMTTLGYGDMYPKTWSG 
MLVGAIiCALAGVLTIAMPVPVIVNNFGMYYSIJ^MAKQKLPKKKNKHIPRPPQPGSPNY 
CKPDPPPPPPPHPHHGSGGISPPPPITPPSMGVTVAGAYPAGPHTHPGLLRGGAGGLG 
I MGLPP LP A PGE PC PLAQEE V I E I NRAD PR PNGD PAAAALAH EDC PA I DQ PAMS P ED K 
SPITPGSRGRYSRDRACFLLTDYAPSPDGSIRKATGAPPLPPQDWRKPGPPSFLPDLN 
ANAAAWISP 



Further analysis of the NOV 120a protein yielded the following properties shown in 
Table 120B. 



Table 12 OB. Protein Sequence Properties NO VI 20a 


PSort 
analysis: 


0.6000 probability located in plasma membrane; 0.5071 probability located in 
mitochondrial inner membrane; 0.4000 probability located in Golgi body; 0.3000 
probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV 120a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 120C. 



Table 120C. Geneseq Results for NOV 120a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV120a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for j 
the Matched 
Region 


Expect 
Value 


AAY34120 


Human potassium channel K+Hnov4 
- Homo sapiens, 601 aa. 
[W09943696-A1, 02-SEP-1999] 


32..526 ! 
4..476 


371/510(72%) 
399/510(77%) 


0.0 


AAY32016 


Caenorhabditis elegans cation 
channel protein - Caenorhabditis 
elegans, 556 aa. [W09947923-A2, 
23-SEP-1999] 


33..512 
27..465 


217/486(44%) 
300/486(61%) j 


e-113 


AAB86319 


Human Kv4.2 protein - Homo 
sapiens, 629 aa. [DEI 996361 2-A1, 
12-JUL-2001] 


16..521 
22..441 


173/511 (33%) 
256/511 (49%) 


5e-69 


AAY13523 


Amino acid sequence of KV4.2FL 
ion channel protein - Mammalia, 630 
aa. [WO9923880-A1, 20-MAY- 
1999] 


16..521 
23..442 


173/511 (33%) 
257/511 (49%) 


8e-68 
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AAW42996 


Putative mature potassium channel 2 
protein - Homo sapiens, 494 aa. 
[US5710019-A, 20-JAN-1998] 


17..510 
4..425 


\ 171/503(33%) 
240/503 (46%) 


2e-66 


In a BLAST search of public sequence databases, the NOV 120a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 120D. 


Table 120D. Public BLASTP Results for NOV120a 


Protein 
Accession 
Number 


i roiein/ urg«m lain/ j_>cngin 


NOV120a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q14003 


Voltage-gated potassium channel 
protein Kv3.3 (KSHIIID) -Homo 
sapiens (Human), 757 aa. 


1..705 ; 
1..757 


704/757 (92%) 
704/757 (92%) 


0.0 


Q01956 


Voltage-gated potassium channel 
protein Kv3.3 (KSHIIID) - Rattus 
norvegicus (Rat), 889 aa. 


1..693 
1..756 i 


663/757 (87%) 
668/757 (87%) 


0.0 


Q63959 


Voltage-gated potassium channel 
protein Kv3.3 (KSHIIID) - Mus 
musculus (Mouse), 769 aa. 


1..671 
1..724 


650/725 (89%) 
653/725 (89%) 


0.0 


A42073 


potassium channel protein Kv3.3 - 
mouse, 679 aa. 


32..607 
8..581 


557/576 (96%) 
559/576 (96%) 


0.0 


Q9PVD1 


KV3.1 POTASSIUM CHANNEL - 
Xenopus laevis (African clawed 
frog), 592 aa. 


34..671 
6..547 


441/640 (68%) 
479/640 (73%) 


0.0 



PFam analysis predicts that the NOV 120a protein contains the domains shown in the 
Table 120E. 



Table 120E. Domain Analysis of NOV120a 


Pfam Domain 


NOV120a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


K_tetra: domain 1 of 1 


36.. 137 


50/1 12 (45%) 
86/1 12 (77%) 


1.6e-47 


thaumatin: domain 1 of 
1 


314..319 


4/6 (67%) 
6/6 (100%) 


0.7 
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ion_trans: domain 1 of 1 


295.. 486 


51/231 (22%) 


2.1e-29 






155/231 (67%) 





Example 121. 

The NOV 121 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 121 A. 



Table 121 A. NOV121 Sequence Analysis 




SEQ ID NO: 337 


1943 bp 


NOV121a, 

CG59938-01 DNA Sequence 


AGAT C CACGTG ATCT CC AAAG ACC C CTG T TG TGTTG TGT TGGG AGGTG G AT C CTG AAT 


CCACCCAGAGAAGCCTG AT ACCAATAAAATC CCTG CTTGCTTT CCAGGAG ACCCTTGG 


T CT TC ATGT CTTTGG TGTGTG CACTCTTGAAC ACATG CC AGG CAC A CAGGGTG CATG A 
CGACAAGCCTAATATTGTCCTAATCATGGTTGATGACCTGX^TATIGGAGATCTGGGC 
TGCTACGGCAATGACACCATGAGGACGCCTCACATCGACCGCCTTGCCAGGGAAGGCG 
TGCGACTGACTCAGCACATCTCTGCCGCCTCCCTCTGCAGCCCAAGCCGGTCCGCGTT 
CTTGACGGGAAGATACCCCATCCGATCAGGTATGGTTTCTAGTGGTAATAGACGTGTC 
ATCCAAAATCTTGCAGTCCCCGCAGGCCTCCCTCTTAATGAGACAACACTTGCAGCCT 
TGCTAAAGAAGCAAGGATACAGCACGGGGCTTATAGGTAAGTTAGGCAAATGGCACCT 
GGGTTTGAGCTGCGCCTCTCGGAATGATCACTGTTACC^CCCGCrCAACCATGGTTTT 

r* npTa PTTTTflCfVV^TfiPPTTTTfifi Af'TTTTAAGC'fi &CTGCC ACSGCATCe^AAGACAC 

CAGAACTGCACCGCTGGCTCAGGATCAAACTGTGGATCTCCACGGTAGCCCTTGCCCT 

GGTTCCTTTTCTGCTTCT(^TTCCCAAGTTCGCCCGCTGGTTCTCAGTGCCATGGAAG 

GTCATCTTTGTCTTTGCTCTCCTCGCCTTTCTGTTTTTCACTTC 

ATGG ATTT ACTCG AC GTTGG AATTG CATC CTT ATG AGG AAC CATG AAATT AT C CAG C A 

GCCAATGAAAGAGGAGAAAGTAGCTTCCCTCATGCTGAAGGAGGCACTTGCTTTCATT 

GAAAGGTACAAAAG^GAACCTTTTCTCCTCTTTTTTTCCTTCCTGCACGTACATACTC 

(^CTCATCTCCAAAAAGAAGTTTGTTGGGCGC^GTAAATATGGC^C^TATGGGGACAA 

TGTAGAAGAAATGGATTGGATGGTGGGTGGTAAAATCCTGGATGCCCTGGACCAGGAG 

CGCCTGGCCAACCACACCTTGGTGTACTTCACCTCTGACAACGGGGGCCACCTGGAGC 

CCCTGGACGGGGCTGTTCAGCTGGGTGGCTGGAACGGGATCTAa^GGTGGOWVGG 

AATGGGAGGATGGGAAGGAGGTATCCGTGTGCCAGGGATATTCCGGTGGCCGTCAGTC 

TTGG AGG CTGGGAGAGTGAT C AATG AGC C CAC C AG CTT AATGG ACAT CT AT C CGACG C 

TGTCTTATATAGGCGGAGGGATCTTGTCCCAGGACAGAGTGATTGACGGCCAGAACCT 

AATGCC CCTGCTGGAAGGAAGGGCGTCCC ACTCCGAC CACGAGTTCCTCTT CCACTAC 

TGTGGGGTCTATCTGCACACGGTCAGGTGGCATCAGAAGGACACTGTGTGGAAAGCTC 

ATTATGTGACTCCTAAATTCTACCCTGAAGGAACAGGTGCCTGCTATGGGAGTGGAAT 

ATGTTCATGTTCGGGGGATGTAACCTACCACGACCCACCACTCCTCTTTGACATCTCA 

AGAG ACCCTTCAGAAGCCCTTCCACTGAACC CTGACAATGAGC CATT ATTTGACTCCG 

TGATCAAAAAGATGGAGGCAGCCATAAGAGAGCATCGTAGGACACTAACACCTGTCCC 

ACAGCAGTTCTCTGTGTT CAACACAATTTGG AAAC CATGG CTGCAGC CTTG CTGTGGG 

ACCTTCCCCTTCTGTGGGTGTGACAAGGAAGATGACATCCTTCCCATGGCTCCCTGAG 

ACCATGCGGACCACGTGTTACCCACCACAAACTTACTGTTACAATGGTCATAGGAGCA 


GAG CTCAC CTG ACTG ATTCATTCCATTTG 




ORF Start: ATG at 122 


ORF Stop: TGAat 1853 




SEQ ID NO: 338 


577 aa 


MW at 65099.5kD 


NOV121a, 

CG59938-01 Protein Sequence 


MS LVCALLNTCQAHRVHDD KPN I VL I MVDDLG I GDLGC YGNDTMRT PH I DRLAREGVR 
LTQHISAASLCSPSRSAFLTGRYPIRSGMVSSGNRRVIQNLAVPAGLPLNETTLAALL 
KKQGYSTGLIGKLGKWHLGLSCASRNDHCYHPLNHGFHYFYGVPFGLLSDCQASKTPE 
LHRWLRI KLW I STVALALVPFLLL I PKFARWFS VPWKVI FVF ALLAFL F FT SWYS S YG 
FTRRWNCILMRNHEIIQQPMKEEKVASLMLKEAIAFIERYKREPFLLFFSFLHVHTPL 
ISKKKFVGRSKYGRYGDNVEEMDWMVGGKIxU3ALDQERLANHTLVYFTSDNGGHLEPL 
DGAVQLGGWNGIYKGGKGMGGWEGGIRVPGIFRWPSVLEAGRVINEPTSLMDIYPTLS 
YIGGGILSQDRVIDGQNIi^PLLEGRASHSDHEFLFHYCGVYLHTVRWHQKDTVWKAHY 
VTPKFTPEGTGACYGSGICSCSGDVTYHDPPLLFDISRDPSEALPLNPDNEPLFDSVI 
KKMEAAIREHRRTLTPVPQQFSVFNTIWKPWLQPCCGTFPFCGCDKEDDILPMAP 



Further analysis of the NOV121 a protein yielded the following properties shown in 
Table 121B. 
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Table 121B. Protein Sequence Properties NOV121a 


PSort 
analysis: 


0.6400 probability located in plasma membrane; 0.4600 probability located in 
Golgi body; 0.3700 probability located in ^endoplasmic reticulum (membrane); 
0.1000 probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV121 a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 121 C. 



Table 121 C. Geneseq Results for NOV121a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV121a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM78688 


Human protein SEQ ID NO 1350 - 
Homo sapiens, 590 aa. 
[WO200157190-A2, 09-AUG-2001] 


1..572 
10..580 


388/576 (67%) 
449/576 (77%) 


0.0 


AAM39343 


Human polypeptide SEQ ID NO 
2488 - Homo sapiens, 589 aa. 
[WO200153312-A1, 26-JUL-2001] 


20..571 
37..587 


331/555 (59%) 
404/555 (72%) 


0.0 


AAM41129 


Human polypeptide SEQ ID NO 
6060 - Homo sapiens, 646 aa. 
[WO200153312-A1, 26-JUL-2001] 


20..571 
94..644 


331/555 (59%) 
404/555 (72%) 


0.0 


AAY39920 


Human steroid sulphatase protein 
sequence - Homo sapiens, 583 aa. 
[WO9950453-A1, 07-OCT-1999] 


20.. 569 
26..575 


295/559 (52%) 
374/559 (66%) 


e-166 


AAB51185 


Human sulfatase protein C SEQ ID 
NO: 14 - Homo sapiens, 583 aa. 
[US6153188-A, 28-NOV-2000] 


20..569 
26..575 


294/559 (52%) 
372/559 (65%) 


e-165 



In a BLAST search of public sequence databases, the NOV121 a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 12 ID. 
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Table 121D. Public BLASTP Results for NOV121a 


A 1 UIC1U 

Accession 
Number 


Protein/Organism/Length 


NOV121a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


P54793 


Arylsulfatase F precursor (EC 3.1.6.-) 
(ASF) - Homo sapiens (Human), 591 
aa. 


1..572 
10..581 


379/577(65%) I 
441/577(75%) 


0.0 


AAH20229 


HYPOTHETICAL 64.9 KDA 
PROTEIN - Homo sapiens (Human), 
593 aa. 


4..574 
24..593 


358/574(62%) 
440/574(76%) 


0.0 


P51689 


Arylsulfatase D precursor (EC 3.1 .6.-) 
(ASD) - Homo sapiens (Human), 593 
aa. 


4..574 
24..593 


349/574 (60%) 
429/574 (73%) 


0.0 


P51690 


Arylsulfatase E precursor (EC 3.1.6.-) 
(ASE) - Homo sapiens (Human), 589 
aa. 


20..571 
37..587 


334/555 (60%) 
405/555 (72%) 


0.0 


P08842 


Steryl-sulfatase precursor (EC 3.1.6.2) 
(Steroid sulfatase) (Steryl- sulfate 
sulfohydrolase) (Arylsulfatase C) 
(ASC) - Homo sapiens (Human), 583 
aa. 


20..569 
26..575 


295/559(52%) 
374/559(66%) 


e-166 



PFam analysis predicts that the NOV 12 la protein contains the domains shown in the 
Table 121E. 



Table 121E. Domain Analysis of NOV121a 


Pfam Domain 


NOV121a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


Sulfatase: domain 1 of 1 


21. .504 


231/530(44%) 
410/530(77%) 


le-187 



Example 122. 



The NOV 122 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 122 A. 
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Table 122A, NOV122 Sequence Analysis 




SEQ ID NO: 339 


3005 bp 


NOV122a, 

CG59746-01 DNA Sequence 


ATTTCTTTGGTGTTGTCTT CAC AG CTGAACTTGC AAAACAGATTGG AACTTCAAGATT 


ATCAATAATCGGAGATACGTATATTTTATTTGTAAAGAAAACATGGCTGCCCTATTCC 


TACGTGGTTTTGTCCAAATAGGGAACTGCAAGACTGGGATATCTAAGTCAAAAGAAGC 
ATTCATTGAAGCAGTGGAAAGAAAGAAGAAAGATAGACTGGTGCTGTATTTCAAAAGT 
GGAAAATAT AGCAC TTTTCGGCTAAGTGAT AATATTCAAAATGTAGTCCTTAAAT CCT 
ATAGAGGAAACCAAAATCACCTGCATTTAACTTTACAAAATAATAATGGCTTGTTTAT 
TGAAGGATTATCCTCCACAGATGCTGAACAATTGAAGATATTCTTGGACAGAGTTCAT 
CAAAACGAGGTTCAGCCACCTGTGAGACCTGGTAAGGGTGGGAGTGTCTTTTCTAGCA 
CAACACAG AAGG AAAT CAAC AAAAC TT CATT C CA CAAAGTTG ATG AG AAAT CAAGT AG 
CAAATCTTTTGAGATAGCAAAAGGAAGTGGGACAGGTGTCCTTCAGAGGATGCCTTTG 
CTTACATCAAAATTGACACTTACTTGCGGAGAGTTATCAGAAAATCAGCACAAGAAGA 
GGAAAAGAATGCTCTCATCTAGCTCAGAGATGAATGAGGAATTCTTGAAAGAAAATAA 
TTCTGTAGAATACAAGAAATCCAAGGCAGATTGTTCGAGGTGTGTAAGCTATAATCGA 
GAGAAACAATTGAAGTTAAAAGAGTTAGAAGAGAATAAGAAATTGGAATGTGAATCTT 
CATGCATCATGAACGCCACTGGAAATCCTTACCTAGATGACATTGGTCTTCTCCAAGC 
TCTCACTGAGAAAATGGTTTTGGTATTTCTGTTACAACAAGGGTATAGTGACGGTTAC 
ACAAAGTGGGATAAATTAAAACTATTTTTTGAATTATTTCCAGAGAAAATATGCCACG 
GCCTCCCCAATTTGGGAAACACCTGTTATATGAATGCAGTGTTACAGTCTCTACTTTC 
AATCCCATCGTTTGCTGATGATTTACTTAATCAGAGTTTCCCATGGGGTAAAATTCCC 
CTTAATGCTCTTACCATGTGCTTGGCACGGCTACTTTTTTTTAAAGATACCTATAATA 
TAGAAATCAAGGAGATGTTACTCTTGAATCTTAAAAAGGCCATTTCAGCAGCTGCAGA 
GATATTCCATGGCAATGCACAGAACGATGCTCATGAGTTTTTAGCTCACTGTTTAGAT 
CAACTGAAAGATAACATGGAAAAACTCAACACAATTTGGAAGCCTAAAAGTGAATTTG 
GGGAAG AT AATTTT CCT AAACAGGTTTTTGCTG ATGATCCTG ACAC CAGTGGGTTTTC 
TTGCCCTGTCATTACTAATTTTGAGTTAGAGTTGTTGCACTCCATTGCTTGTAAAGCT 
TOTGGTCAGGTTATTCTCAAGACAGAACTGAATAATTACCTCTCCATCAACCTTCCCC 
AAAGAATAAAAGCACATCCTTCATCTATTCAGTCTACTTTTGATCTTTTTTTTGGAGC 
AGAAGAGCTTGAGTATAAATGTGCAAAATGTGAGCACAAGACTTCCGTTGGAGTGCAC 
TCATTCAGTAGGCTACCTAGAATCCTTATTGTTCACCTCAAACGCTATAGCTTGAATG 
AGTTTTG TG CATT AAAGAAGAATGACC AG GAAGT CATC ATTTC CAAATATTTAAAGG T 
GTCTTCTCATTGCAATGAAGGCACCAGACCACCTCTTCCCTTGAGTGAGGATGGAGAA 
ATTACAGATTTCCAATTATTAAAAGTTATTCGAAAGATGACTTCTGGAAACATCAGTG 
TATCATGG CCTGCAACAAAGGAATCC AAAG AT AT CCTGG CTCCACAC ATTGGATCAG A 
TAAGGAGTCTGAACAAAAAAAAGGCCAGACAGTCTTTAAAGGGGCAAGCAGAAGACAG 
C AGC AAAAGTACCTTGG AAAAAATTCTAAACCAAATG AG CTAGAAT CTGTATACTCAG 
GAGATCGAGCATTCATTGAAAAAGAACCGTTAGCTCACTTAATGACGTATCTGGAAGA 
TACCTCACnTTGTCAGTTCCACAAAGCTGGAGGTAAACCTGCCAGCAGCCCAGGCACA 
CCTCTCTCAAAAGTTGACTTTCAAACAGTGCCCGAAAATCCAAAACGAAAGAAATATG 
TG AAAACCAG T AAGTTTG T AGCTTTTG AT AG G ATT AT CAATC CT ACT AAAG ATTTGT A 
TGAAGATAAAAATATCAGAATTCCAG AAAG ATTCCAAAAAGTGTCTGAACAG ACT CAG 
CAGTGTG ACGGTATG AGAATCTGTGAACAAG C CCCTCAGCAGGCACTGCCTCAAAGCT 
TTCCAAAGCCIAGGCACCCAGGGGCACACAAAGAACCTCCTAAGACCTACAAAATTAAA 
TCTACAGAAGTCTAACAGGAATTCCCTACTTGCACTGGGTTCCAATAAGAATCCAAGA 
AACAAAGACATTTTAGATAAGATAAAATCTAAAGCCAAGGAAACAAAAAGAAATGATG 
AT AAGGG AGAT C AT ACCT ACCGG CTCATT AG TGTTG T CAG CCATCTTGGG AAG ACT CT 
AAAGTC AGGC CATT AT ATCTGTGATG C CT ATG ACTTTG AG AAAC AG AT CTGGTT C ACT 
TACGATGATATGCGGGTGTTAGGTATCCAGGAGGCCCAGATGCAGGAGGATAGGCGTT 
G CACTGGGT ACATCTT CTTTT AC ATGCAT AATG AG ATC TTTG AAG AGATG TTG AAAAG 
AG AAGAG AATGC CCAG CTT AAT AGCAAGG AGGT AGAGG AG AC C CT T CAG AAGG AATAA 
GAGGAACX3TACTCCTCCTTGTACAGATCTGCCTGACTGTCTCACTCGATACCACTTCC 


TCCATGGAAGGAAAACTGTGAACTTTATCCAGAGATGAAAATGCAATTAGTCTAGGAC 


CAAAGGTCAAACAGAAACACTTAATGGGGAGATCTGCATTCTAATCC 




ORF Start: ATG at 101 


ORF Sto] 


p: TAA at 2840 




SEQ ID NO: 340 


913 aa 


MWat 104046.0kD 


NOV122a, 

CG59746-01 Protein Sequence 


MAALFLRG FVQ IGNCKTG I S KS KEAFI EAVERKKKDRLVL YFKS GKYSTFRLSDN I QN 
WLKSYRGNQNHLHLTLQNNNGLF I EGLSSTDAEQLKI FLDR VHQNEVQP PVRPGKGG 
SVFSSTTQKEINKTSFHKVDEKSSSKSFEIAKGSGTGVLQRMPLLTSKIiTLTCGELSE 
NQHKKRKRMLSSS S EMNEEFLKENNS VE YKKS KADCSRCVS YNREKQLKLKELEENKK 
LECESSCIMNATGNPYLDDIGLLQALTEKMVLVFLLQQGYSDGYTKWDKLKLFFELFP 
EKI CHGLPNLGNTC YMNAVLQS LLS I PS FADDLLNQSF PWGKI PLNALTMCLARLLFF 
KDT YNI E I KEMLLLNLKKA I S AAAE I FHGN AQNDAH E FLAHC LDQLKDNMEKLNT I WK 
PKSEFGEDNFPKQVFADDPDTSGFSCPVITNFELELLHS I ACKACGQVI LKTELNNYL 
SINLPQRIKAHPSSIQSTFDLFFGAEELEYKCAKCEHKTSVGVHSFSRLPRILIVHLK 
RYSLNEFCALKKNDQEVIISKYLKVSSHCNEGTRPPLPLSEDGEITDFQLLiCVIRKMT 
SGN I SVSWPATKES KD I LAPH IGSDKES EQKKGQTVFKGASRRQQQKYLGKNSKPNEL 
E SVYSGDRAF I EKE PLAHLMTYLEDTSLCQFHKAGGKPASS PGT PLSKVD FQTVPENP 
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KRKKYVKTSKFVAFDRI INPTKDLYEDKNI RI PERFQKVSEQTQQCDGMRICEQAPQQ 
ALPQ S F PKPGTQGHTKNLLRPT KLNLQKSNRNS LLALGSNKN PRNKD I LDK I KS KAKE 
TKRNDDKGDHT YRL I S WS HLG KT LKSGHY I CD AYD FE KQ I WFT YDDMR VLG I QEAQM 
QEDRRCTGYI FFYMHNEI FEEMLKREENAQLNSKEVEETLQKE 



Further analysis of the NOV 122a protein yielded the following properties shown in 
Table 122B. 



Table 122B. Protein Sequence Properties NOV122a 


PSort 
analysis: 


0.7000 probability located in nucleus; 0.4270 probability located in 
mitochondrial matrix space; 0.3000 probability located in microbody 
(peroxisome); 0.1047 probability located in mitochondrial inner membrane 


SignalP 
analysis: 


Likely cleavage site between residues 16 and 17 



A search of the NOV 122a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 122C. 



Table 122C. Geneseq Results for NOV122a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV122a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAU07888 


Polypeptide sequence for human 
hspG25 - Homo sapiens, 913 aa. 
[WO200166752-A2, 13-SEP-2001] 


1..913 
1..913 


913/913 (100%) 
913/913 (100%) 


0.0 


AAB75607 


Human cancer associated antigen 
precursor HOM-TES-84/6 SEQ ID 
NO:6 - Homo sapiens, 912 aa. 
[WO2001 00874- A2, 04-JAN-2001] 


1..905 
1..904 i 


429/920 (46%) 
566/920 (60%) 


0.0 


AAU07869 j 


Polypeptide sequence for mammalian 
Spg25 - Mammalia, 835 aa. 
[WO200166752-A2, 13-SEP-2001] 


1..904 
1..834 


335/921 (36%) 
504/921 (54%) 


e-147 


AAG75460 j 


Human colon cancer antigen protein 
SEQ ID NO:6224 - Homo sapiens, 
109 aa. [WO200122920-A2, 05-APR- 
2001] 


810..912 
3..107 


61/105 (58%) 
79/105 (75%) 


3e-28 


AAB39364 \ 


Gene 8 human secreted protein 
homologous amino acid sequence 
#1 1 3 - Bos taurus, 64 aa. 
[WO200057903-A2, 05-OCT-2000] 


810..871 
1..64 


39/64 (60%) 
48/64 (74%) 


5e-15 
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In a BLAST search of public sequence databases, the NOV 122a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 122D. 



Table 122D. Public BLASTP Results for NOV122a 


jrroicin 
Accession \ 
Number 


Protein/Organism/Length 


NOV122a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9BXU7 


Ubiquitin carboxyl -terminal hydrolase 
26 (EC 3. 1.2. 15) (Ubiquitin 

tV»ir»1^ctf»mcp 0&\ C\ THinmtin-CT*<*pifiP 

processing protease 26) 
(Deubiquitinating enzyme 26) - Homo 
sapiens (Human), 913 aa. 


1..913 
1..913 


913/913(100%) ! 
913/913(100%) i 


0.0 


Q9HBJ7 


UBIQUITIN-SPECIFIC 
PROCESSING PROTEASE - Homo 

oaUltlia 1 11 UJIlcU.1 J) aa. 


1..905 
1..904 


429/920(46%) \ 
566/920(60%) 


0.0 


Q9HCH8 


KIAA1594 PROTEIN - Homo sapiens 
(Human), 93 1 aa (fragment). 


50..912 
3..929 


393/932(42%) 
535/932(57%) ; 


e-171 


Q99MX1 


Ubiquitin carboxyl-terminal hydrolase 
26 (EC 3. 1.2. 15) (Ubiquitin 
thiolesterase 26) (Ubiquitin-specific 
processing protease 26) 
(Deubiquitinating enzyme 26) - Mus 
musculus (Mouse), 835 aa. 


1..904 
1..834 


335/921 (36%) ; 
504/921 (54%) ; 


e-147 


Q9ES63 ; 


UBIQUITIN-SPECIFIC 
PROCESSING PROTEASE - Mus 
musculus (Mouse), 869 aa. 


1..908 
1..848 


341/933 (36%) i 
480/933(50%) \ 


e-131 



PFam analysis predicts that the NOV122a protein contains the domains shown in the 
Table 122E. 



Table 122E. Domain Analysis of NOV122a 


Pfam Domain 


NOV122a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


UCH-1: domain 1 of 1 


295..326 


21/32 (66%) 
29/32 (91%) 


8.8e-12 


UCH-2: domain 1 of 1 


820..885 


20/72 (28%) 
47/72 (65%) 


2.2e-l 1 
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Example 123. 

The NOV 123 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 123 A. 



Table 123 A. NOV123 Sequence Analysis 




SEQ ID NO: 341 


2146 bp 


NOV123a, 

CG88613-01 DNA Sequence 


GAAGGAGCGGGCATGAGGCG CTGCCCGTGCCGTGGGAG CCTGAACG AGGCGGAGGCCG 

GGGCGCTGCCCGCGGCGGCCCGCATGGGACTGGAGGCGCCGCGAGGAGGGCGGCGGCG 

GCAGCCGGGACAGCAGCGACCTGGGCCCGGCGCAGGGGCCCCGGCGGGGCGGCCGGAG 

GGGGGCGGGCCCTGGG CCCGGACAGAGGGGTCC AG CCTCCACAG CGAGCCTG AGAGGG 

CCGGCCTCGGGC CTGCGCCGGGGAC AGAGAGTCCG CAGGCAGAATTCTGGAC AGACGG 

ACAGACTG AGCC CG CGGCAG CTGGCCTTGGAGTAG AGACCGAGAGGCCC AAGC AAAAG 

ACGG AG CCAGACAGGTCCAG CCTCCGGACGCATCT AGAATGGAGCTGGTCAGAGCTGG 

AGACGACTTGTCTTTGGACGGAGACCGGGACAGATGGCCTTTGGACTGATCCGCACAG 

GTCCGACCTCCAGTTTCAGCCCGAGGAGGCCAGCCCCTGGACACAGCCAGGGGTTCAT 

GGGCCCTGGACAGAGCTGGAAACGCATGGGTCACAGACTCAGCCAGAGAGGGTCAAGT 

CCTGGGCTGATAACCTCTGGACCCACCAGAACAGTTCCAGCCTCCAGACTCACCCAGA 

AGGAGCCTGTCCCTCAAAAGAGCCAAGTGCTGATGGCTCCTGGAAAGAATTGTATACT 

GATGGCTCCAGGACACAACAGGATATTGAAGX3TCCCTGGACAGAGCCATATACTGATG 

G CTCCCAG AAAAAACAGG ATACTG AAGCAG CCAGG AAACAGCCTGG CACTGGTGGTTT 

CCAAATACAACAGGATACTGATGGCTCCTGGACACAACCTAGCACTGACGGTTCCCAG 

AC AG CAC CTGGGACAG A CTG CCT CTTGGGAG AGCCTG AGG ATGG CCCATT AG AGGAAC 

CAGAGC CTGG AGAATTG CTGACTCACCTCTACTCTCACCTGAAGTGTAGCCCCCTGTG 

CCCTGTGCCCCGCCTCATCATTACCCCTGAGACCCCTGAGCCTGAGG 

GGACCCCCCTCCCGGGTTGAGGGGGGCAGCGGCGGCTTCTCCTCTGCCTCTTCTTTCG 

ACGAGTCTGAGGATGACGTGGTGGCCGGGGGCGGAGGTGCCAGCGATCCCGAGGACAG 

GTCTGGGAGCAAACCCTGGAAGAAGCTGAAGACAGTTCTGAAGTATTCACCCTTTGTG 

GTCTCCTTCCGAAAACACTACCCTTGGGTCCAGCTTTCTGGACATGCTGGGAACTTCC 

AGGCAGGAGAGGATGGTCXKSATTCTGAAACXSfTTCTGTCAGTGTGAGCAGCGCAGCCT 

G GAG CAG CTG ATGAAAGAC C CG CTG CG ACCTTTCG TG CCTGC CT ACT ATGG CATGGTG 

CTGCAGGATGGCCAGACCITCAACCAGATGGAAGACCTCCTCGCTGACTTTGAGGGCC 

CCTCCATTATGGACTGCAAGATGGGCAGCAGGACCTATCTGGAAGAGGAGCTAGTGAA 

GGCACGGGAACGTCCCCGTCCCCGGAAGGACATGTATGAGAAGATGGTGGCTGTGGAC 

CCTGGGGCCCCTACCCCTGAGGAGCATGCCCAGGGTGCAGTCACCAAGCCCCGCTACA 

TCCAGTGGAGGGAAACCATGAGCTCCACCTCTACCCT<X^CTTCCGGATCGAGGGCAT 

CAAGAAGGCAGATG^ACCrGTAACACCAACTTCAAGAAGACGCAGGCACTG^ 

GTGACAAAAGTCCTGGAGGACTTCGTGGATGGAGACCACGTCATCCTGCAAAAGTACG 

TCGCATGCCTAGAAGAACTTCGTGAAGCTCTGGAGATCTCCCCCTTCTTCAAGACCCA 

CGAGGTGGTAGGCAGCTCCCTCCTCTTCGTGCACGACCACACCGGCCTGGCCAAGGTC 

TGGATGATAGACTTCGGCAAGACGGTGGCCTTGCCCGACCACCAGACGCTCAGCCACA 

GG CTG C C CTGGGCTG AGG G C AA C CG TGAGGACG G CT AC CTCTGGGG C CTGG AC AACAT 

GATCTGCCTCCTGCAGGGGCTGGCACAGAGCTGAGCTGCTCAGCCACCATCAGGTTAA 

TTGGATGGCGCCAGTCTGGCTGGAGGAGCCCTGAGATGCCATGGGAGGCCTGAGGTTG 




ORF Start: ATGat 13 


ORF Stop: TGA at 2062 




SEQ ID NO: 342 


683 aa MW at 75206.8kD 


NOV123a, 

CG88613-01 Protein Sequence 


MRRCPCRG SLNEAEAGALPAAARMGLEAPRGGRRRQ PGQQR PG PGAGAP AGR P EGGG P 
WARTEGSSLHSEPERAGLGPAPGTESPQAEFWTDGQTEPAAAGLGVETERPKQKTEPD 
RS SLRTHLE WS WS ELETTCLWTETGTDGLWTD PHRS DLQFQ PE EAS PWTQPG VHG PWT 
ELETHGSQTQPERVKSWADNLWTHQNSSSLQTHPEGACPSKEPSADGSWKELYTDGSR 
TQQDIEGPWTEPYTDGSQKKQDTEAARKQPGTGGFQIQQDTDGSWTQPSTDGSQTAPG 
TDCLLGEPEDGPLEEPEPGELLTHLYSHLKCSPLCPVPRLIITPETPEPEAQPVGPPS 
RVEGGSGGFSSASSFDESEDDWAGGGGASDPEDRSGSKPWKKl^TVLKYSPFVVSFR 
KHYPWVQLSGHAGNFQAGEDGRILKRFCQCEQRSLEQLMKDPLRPFVPAYYGMVLQDG 
QTFNQMEDLLADFEGPSIMDCKMGSRTYLEEELVKARERPRPRKDMYEKMVAVDPGAP 
TPEEHAQGAVTKPRYMQWRETMS STSTLGFRI EGI KKADGTCNTNFKKTQALEQVTKV 
LEDF VDGDHVI LQKYVACLEELREALE I SPFFKTHE WGSS LLFVHDHTGLAKVWMI D 
FGKTVALPDHQTLSHRLPWAEGNREDGYLWGLDNMICLLQGLAQS 



Further analysis of the NOV123a protein yielded the following properties shown in 
Table 123B. 
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Table 123B. Protein Sequence Properties NOV123a 


PSort 
analysis: 


0.5663 probability located in microbody (peroxisome); 0.3000 probability 
located in nucleus; 0.1000 probability located in mitochondrial matrix space; 
0. 1 000 probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV 123 a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 123C. 



Table 123C. Geneseq Results for NOV123a 


Geneseq 
Identifier 


Protein/Organism/Length (Patent 
#, Date] 


NOV123a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM41393 


Human polypeptide SEQ ID NO 
6324 - Homo sapiens, 687 aa. 
[WO200153312-A1, 26-JUL-2001] 


1..683 
5..687 


682/683(99%) \ 
682/683(99%) ! 


0.0 


AAM39607 


Human polypeptide SEQ ID NO 
2752 - Homo sapiens, 71 1 aa. 
[WO200153312-Al,26-JUL-2001] i 


12..683 
36..711 


642/680(94%) 
643/680(94%) I 


0.0 


AAE04364 


Human kinase (PKIN)-5 - Homo 
sapiens, 798 aa. [WO200146397-A2, 
28-JUN-2001] 


273..682 
380..793 


219/432(50%) 
285/432(65%) ; 


e-117 



In a BLAST search of public sequence databases, the NOV 123a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 123D. 



Table 123D. Public BLASTP Results for NOV123a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV123a 
Residues/ \ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96DU7 


INOSITOL 1,4,5- 
TRISPHOSPHATE 3-KINASE C - 
Homo sapiens (Human), 683 aa. 


1..683 > 
1..683 


683/683 (100%) 
683/683 (100%) 


0.0 


Q9Y475 


INOSITOL 1,4,5- 


83..683 
4..604 


601/601 (100%) 
601/601 (100%) 


0.0 
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ISOENZYME (EC 2.7.1.127) - 
Homo sapiens (Human), 604 aa 
(fragment). 








SI 7682 


lD-myo-inositol-trisphosphate 3- 
kinase (EC 2.7.1.127) B - human, 
472 aa. 


273..682 
54..467 


219/432 (50%) 
285/432 (65%) 


e-117 


CAB65055 


INOSITOL 1,4,5- 

TRISPHOSPHATE 3 -KINASE B - j 
Homo sapiens (Human), 946 aa. 


273..682 
528..941 


219/432(50%) 
285/432(65%) 


e-117 


Q96JS1 


INOSITOL 1,4,5- 
TRISPHOSPHATE 3-KTNASE, 
ISOFORM B (EC 2.7.1.127) - Homo 
sapiens (Human), 946 aa. 


273..682 
528..941 


219/432(50%) 
285/432 (65%) 


e-117 



PFam analysis predicts that the NOV 123a protein contains the domains shown in the 
Table 123E. 



Table 123E. Domain Analysis of NOV123a 



Pfam Domain 



NOV123a Match Region 



Identities/ 
Similarities 
for the Matched Region 



Expect Value 



No Significant Matches Found 



Example 124. 

The NOV 124 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 124 A. 



Table 124 A. NO VI 24 Sequence Analysis 




SEQIDNO:343 1395 bp 


NOV124a, 

CG59993-01 DNA Sequence 


GGTAAGACGACCTCTGGATGCTCACCCTGCCCTCTTCACCTCTCGTCCCCAGCTGTTT 


CCTCTGCCACCATOAGGAACATTTTCAAGAGGAACCAGGAGCCTATTGTGGCTCCTGC 
CACCACCACCGCCACGATGCCCATTGGACCCGTGGACAACTCCACTGAGAGTGGGGGT 
GCTGGGGAGAGCCAGGAGGACATGTTTGCCAAACTGAAGGAGAAGTTATTCAATGAGA 
TAAACAAGATTCCCTTACCACCCTGGGCACTG ATCGCCATTG CTGTGGTTG CTGGGCT 
CCTG CTT CTCACCTGCTGCTTCTG CATCTG CAAG AAATGCTG CTGC AAG AAGAAGAAG 
AACAAGAAGGAGAAGGGCAAAGGCATGAAGAATGCCATGAACATGAAGGACATGAAAG 
GGGGTCAGGATGACGACGACGCAGAGACAGGCCTGACTGAGGGGGAAGGTGAAGGGGA 
GGAGGAGAAAGAGCCAGAGAACCTGGGCAAACTGCAGTTTTCCCTGGACTATGATTTT 
CAGGCTAATCAGCTTACTGTGGGCGTTCTGCAGGCTGCTGAACTGCCTGCCCTGGACA 
TGGGAGGCACCTCAGACCCTTATGTCAAGGTCTTCCTCCTTCCTGACAAGAAGAAGAA 
ATATGAGACCAAAGTCCATCGGAAGACACTGAACCCTGCCTTCAATGAAACCTTCACC 
TTCAAGGTGCCATACCAGGAGCTTGGGGGCAAAACTCTGGTGATGGCCATCTATGACT 
TTGACCGCTTCTCCAAACATGACATCATTGGAGAGGTAAAGGTGCCTATGAACACAGT 
GG ACCTCGGCCAGCCCATTGAGG AGTGGAG AGACCTG C AAGG CGGGGAAAAGGAGG AG 
CCGGAGAAGCTGGGCGACATCTGCACCTCCCTGCGCTATGTGCCCACGGCCGGGAAGC 
TCACTGTCTGCATCCTGGAGGCTAAGAACCTCAAGAAGATGGACGTGGGCGGCCTTTC 
AGACCCGTACGTGAAGATCCACCTGATGCAGAATGGCAAGAGGCTCAAGAAGAAGAAG 
AC AACCGTGAAG AAGAAG AC CCTGAAC CCATACTTCAACG AGTC CTTCAGCTTTG AGA 
TCCCCTTCGAGC AG ATTCAG AAAGT CCAGGTAGTGGTCACCGTG CTGG ACTATGACAA 
G CTGGGCAAG AACX3AAG CCATAGGCAAGATCTTCGTGGGC AGCAATG CCACGGG CACA 
G AGCTGCGGCACTGGTCCG AC ATGCTGG CCAACCCC CGG AGG CCCATCG CCCAGTGG C 
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ACTCGCTCAAGCCTGAGGAGGAGGTGGATGCACTCCTGGGCAAGAACAAGTAGACAGC 
AGCGG CTGGGACCCCACACCTTTCACGG ACACTGAC AAG ATCCAG AGCTAT CAATACC 


TCA 




ORF Start: ATG at 70 1 ORF Stop: TAG at 1 327 




SEQIDNO:344 |419aa 


MWat46871.8kD 


NOV124a, 

CG59993-01 Protein Sequence 


MRKI F KRNQE P I VA P ATTTATM PIG PVDNSTE S GG AG ESQEDMF AKLKEKL FNE INK I 
PLPPWALI AI AWAGLLLLTCCFC I CKKCCCKKKKNKKEKGKGMKNAMNMKDMKGGQD 
DDDAETGLTEGEGEGEEEKEPE^LGKLQFSLDYDFQANQLTVGVLQAAELPALDMGGT 
SDPYVKVFLLPDKKKKYETKVHRKTLNPAFNETFTFKVPYQELGGKTIiVMAIYDFDRF 
SKHDI IGE VKVPMNTVDLGQP I EEWRDLQGGEKEE PEKLGDI CTS LRYVPTAGKLTVC 
ILEAKNLKJChTOVGGLSDPYVKIHI^QNGK^LKKKKTTVKKKTLNPyFNESFSFEIPFE 
QIQKVQVVVTVLDYDKLGKNEAIGKIFVGSNATGTELRHWSDMLANPR^ 
PEEEVDALLGKNK 




SEQ ID NO: 345 


1338 bp 


NOV124b, 

CG59993-02 DNA Sequence 


CC ACC ATOAGGAACATTTTCAAGAGGAACCAGGAG CCT ATTGTGG CTCCTGCCAC CAC 
CACCGCCACGATGCCCATTGGACCCGTGGACAACTCCACTGAGAGTGGGGGTGCTGGG 
G AG AGT CAGG AGG ACATGTTTG CCAAACTGAAG G AG AAG TT ATT CAATGAG AT AAACA 
AGATTCCCTTACCACCCTGGGCACTGATCGCCAT^GCTGTGGTTGCTGGGCTCCTGCT 
TCT C ACCTGCTGCTTCTG CATCTGCAAGAAATG CTG CTGC AAGAAGAAGAAG AACAAG 
AAG G AG AAGGGC AAAG G T ATG AAG AATG CC ATGAA C ATG AAGGAC ATG AAAG GGGG TC 
AGGATGACGACGACGCAGAGACAGGCCTGACTGAGGGGGAAGGTGAAGGGGAGGAGGA 
G AAAG AGC CAG AGAAC CTGGG C AAACTG CAG TTTT C CCTGG ACT ATG ATTTT CAGG CT 
AATCAGCTTACrGTGGGCGTTCTGCAGGCTGCTGAACTGCCTGCCCTGGACATGGGAG 
GCACCTCAGACCCTTATGTCAAGGTCTTCCTCCTTCCTGACAAGAAGAAGAAATATGA 
GACCAAAGTCCATCGGAAGACACTGAACCCTGCCTTCAATGAAACCTTCACCTTCAAG 
GTGCCATACCAGGAGCTTGGGGGCAAAACTCTGGTGATGGCCATCTATGACTTTGACC 
G CTT CTCCAAAC ATGACATCATTGG AG AGGTAAAG G TG C CT ATG AACACAG TGGACCT 
CGG CCAGC CCATTGAGGAGTGGAG AGACCTGCAAGGCGGGGAAAAGGAGGAGCCGGAG 
AAGCTGGGCGACATCTGCACCTCCCTGCGCTATGTGCCCACGGCCGGGAAGCTCACTG 
TCTGCATC CTGGAGGCTAAGAAC CTCAAGAAGATGGACGTGGGCGGCCTTTCAGACCC 
GTACGTGAAGATCCACCTGATGCAGAATGGCAAGAGGCTCAAGAAGAAGAAGACAACC 
ATG AAGAAGAAG ACCCTGAACCCAT ACTTCAACG AGTCCTTCAG CTTTG AG ATC C CCT 
TCG AGCAGATTC AG AAAGTC CAGGT AGTGGTCAC CGTGCTGGACTATG ACAAGCTGGG 
CAAG AACGAAGCCATAGG CAAGATCTTCGTGGG CAG CAATGCCACGGGCACAGAG CTG 
CGGCACTGGTCCGACATGCTGGCCAACCCCCGGAGGCCCATCGCCCAGTGGCACTCGC 
T CAAGC CTGAGG AGG AGG TGGGTG CA CTC CTGGG CAAGAAC AAG TAGACAGC AG CG G C 
TGGGACCCCACACCTTTCACGGACACTGACAAGATCCAGAGCTATCAATAAGGTGTAG 


GCGG 




ORF Start: ATG at 6 


ORF Stop: TAG at 1263 




SEQ ID NO: 346 


419 aa 


MWat 46845.9kD 


NOV124b, 

CLoyyyj-Ui rrotein sequence 


MRNI FKRNQE PI VAPATTTATMP I GPVDNSTESGGAGESQEDMFAKLKEKLFNE I NKI 
PL PPWALI AI AWAGLLLLTCCFCI CKKCCCKKKKNKKEKGKGMKNAMNMKDMKGGQD 
DDDAETGLTEGEGEGEEE KE PENIiGKLQFSLDYDFQANQLTVGVLQAAELPALDMGGT 
SDPYVKVFLLPDKKKKYETKVHRKTLNPAFWETFTFKVPYQELGGKTLVMAIYDFDRF 
SKHDI I GE VKVPMNTVD LGQP I EEWRDLQGGEKEE PEKLGDI CTS LRYVPTAGKLTVC 
I LEAKNLKKMDVGGLSDPYVKI HLMQNGKRLKKKKTTMKKKTLNP YFNES FS FE I PFE 
QIQKVQVVVTVLDYDKI^KNEAIGKIFVGSNATGTELRHWSDMIANPRRPIAQWHSLK 
PEEEVGALLGKNK 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 124B. 



Table 124B. Comparison of NO VI 24a against NOV 124b. 


Protein Sequence 


NO VI 24a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV124D 


1..419 
1..419 


335/419(79%) 
335/419 (79%) 
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Further analysis of the NOV 124a protein yielded the following properties shown in 
Table 124C. 



Table 124C Protein Sequence Properties NO VI 24a 


PSort 
analysis: 


0.8202 probability located in mitochondrial inner membrane; 0.6000 probability 
located in endoplasmic reticulum (membrane); 0.3500 probability located in 
nucleus; 0.3034 probability located in mitochondrial intermembrane space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV 124a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 124D. 



Table 124D. Geneseq Results for ISO VI 24a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] | 


NOV124a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region \ 


Expect 
Value 


AAR97722 


Mouse inositol polyphosphate j 
binding protein IP4-BP - Mus j 
musculus, 422 aa. [JP08092290-A, ! 
09-APR-1996] | 


1.419 
1..422 


412/422(97%) j 
414/422(97%) | 


0.0 


AAU19715 


Human novel extracellular matrix j 
protein, Seq ED No 365 - Homo 1 
sapiens, 461 aa. [WO200155368-A1, ; 
02-AUG-2001] j 


128..405 
169..447 


141/280(50%) j 
201/280(71%) 


2e-80 


AAU19714 


Human novel extracellular matrix 
protein, Seq ID No 364 - Homo j 
sapiens, 295 aa. [WO200155368-A1, I 
02-AUG-2001] j 


141. .409 
11..281 


140/273(51%) : 
193/273(70%) ! 


3e-74 


AAW87702 


A human membrane fusion protein 
designated SYTAX2 - Homo sapiens, 
375 aa. [W09856813-A2, 17-DEC- 
1998] 


59..407 
31..364 


146/352(41%) ! 
220/352(62%) j 


4e-73 


AAO05534 


Human polypeptide SEQ ID NO 
19426 - Homo sapiens, 149 aa. 
[WO200164835-A2, 07-SEP-2001] 


33..164 
15..149 


127/135(94%) j 
131/135(96%) j 


5e-70 



In a BLAST search of public sequence databases, the NOV 124a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 124E. 
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Table 124E. Public BLASTP Results for NOV124a 



Protein 
Accession 
Number 


Protein/Organism/Length 


NOV124a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


P29101 


Synaptotagmin II (Sytll) - Rattus 
norvegicus (Rat), 422 aa. 


1..419 
1..422 


411/422 (97%) 
414/422 (97%) 


0.0 


A55417 


synaptotagmin II - mouse, 422 aa. 


1..419 
1..422 


412/422 (97%) 
414/422 (97%) 


0.0 


P46097 


Synaptotagmin II (Sytll) - Mus 
musculus (Mouse), 422 aa. 


1..419 
1..422 


411/422(97%) 
413/422 (97%) 


0.0 


P24506 


Synaptotagmin B (Synaptic vesicle 
protein 0-P65-B) - Discopyge 
ommata (Electric ray), 439 aa. 


10..419 
27..439 


341/413 (82%) 
366/413 (88%) 


0.0 ' 


P46096 


Synaptotagmin I (Sytl) (p65) - Mus 
musculus (Mouse), 421 aa. 


10..419 
8..421 


323/418 (77%) 
353/418 (84%) 


0.0 



PFam analysis predicts that the NOV 124a protein contains the domains shown in the 
Table 124F. 



Table 124F. Domain Analysis of NO VI 24a 


Pfam Domain 


NOV124a Match 
Region 


Identities/ 
Similarities 
for the Matched 
Region 


Expect 
Value 


Adeno E3 CR2: domain 1 
ofl 


62..108 


16/50 (32%) 
26/50 (52%) 


6.5 


C2: domain 1 of 2 


1 56..242 


54/97 (56%) 
81/97 (84%) 


1.8e-42 


C2: domain 2 of 2 


287..375 


44/97 (45%) 
80/97 (82%) 


2.9e-39 



Example 125. 



The NOV 125 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 125 A. 



Table 12SA. NOV12S Sequence Analysis 




SEQ ID NO: 347 


3226 bp 
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NOV125a, 

CG59991-01 DNA Sequence 


GGACCACTTCTGATGCATCTCTGGGTCCCAACACTATCCACTGCAAGGCCTCGAAACA 


GGGGGG CCAG ATGGGACCCCCATTTAGCACAAG AGAGACGTCCACACTCTGTG AGCC C 
AAAGGGAG AAGG CT CAGGCCACGGCAGAGACGG AACCAGG AAAACGT CACGAAAAACA 
GCCTCAAGTTGCCAGGTCCCTTGCAGGAACAGACAGGCCTGGGGCCGCCCCACCTGGG 
CTCAGAGCTTGGGCTGCATGGAGGTGACACATGGGACTACAAGAGTCACGTGATGACC 
AAATTCGCTGAGGAGGAGGATGTACGTCGTAGTTTTGAAAACACTGCTGCTGACTGGC 
CGGAAATGCAAACGTTGGCTGGTGCTTTTGATTCAGACCGGTGGGGCTTCCGGCCTCG 
CACGGTGGTTCTGCACGGAAAGTCAGGAATTGGGAAATCGGCTCTAGCCAGAAGGATC 
GTGCTGTGCTGGGCGCAAGGTGGACTCTACCAGGGAATGTTCTCCTACGTCTTCTTCC 
TCCCCGTTAGAGAGATGCAGCGGAAGAAGGAGAGCAGTGTCACAGAGTTCATCTCCAG 
GGAGTGGCCAGACTCCCAGGCTCCGGTGACGGAGATCATGTCCCGACCAGAAAGGCTG 
TTGTTCATCATTGACGGTTTCGATGACCTGGGCTCTGTCCTCAACAATGACACAAAGC 
TCTG CAAAGACTGGGCTGAG AAGCAGCCTCCGTTCAC CCTCATACGCAGTCTG CTGAG 
GAAGGTCCTGCTCCCTGAGTCCTTCCTGATCGTCACCGTCAGAGACGTGGGCACAGAG 
AAGCTCAAGTCAGAGGTCGTGTCTCCCCGTTACCTGTTAGTTAGAGGAATCTCCGGGG 
AACAAAGAATCCACTTGCTCCTTGAGCGCGGGATTGGTGAGCATCAGAAGACACAAGG 
GTTGCGTGCGATCATGAACAACCGTGAGCTGCTCGACCAGTGCCAGGTGCCCGCCGTG 
GGCTCTCTCATCTGCGTGGCCCTGCAGCTGCAGGACGTGGTGGGGGAGAGCGTCGCCC 
CCTTCAACCAAACGCTCACAGGCCTGCACGCCGCTTTTGTGTTTCATCAGCTCACCCC 
TCGAGGCGTGGTCCGGCGCTGTCTCAATCTGGAGGAAAGAGTTGTCCTGAAGCGCTTC 
TGCCGTATGGCTGTGGAGGGAGTGTGGAATAGGAAGTCAGTGTTTGACGGTGACGACC 
TCATGGTTCAAGGACTCGGGGAGTCTGAGCTCCGTGCTCTGTTTCACATGAACATCCT 
TCTCCCAGACAGCCACTGTGAGGAGTACTACACCTTCTTCCACCTCAGTCTCCAGGAC 
TTCTGTGCCG C CTTGT ACTACGTGTT AGAGGGC CTGG AAATCGAGCCAGCTCT CTGCC 
CTCTGTACGTTGAGAAGACAAAGAGGTCCATGGAGCTTAAACAGGCAGGCTTCCATAT 
CCACTCGCTTTGGATGAAGCGTTTCTTGTTTGGCCTCGTGAGCGAAGACGTAAGGAGG 
CCACTGGAGGTCCTGCTGGGCTGTCCCGTTCCCCTGGGGGTGAAGCAGAAGCTTCTGC 
ACTGGGTCTCTCTGTTGGGTCAGCAGCCTAATGCCACCACCCCAGGAGACACCCTGGA 
CGCCTTCCACTGTCTTTTCGAGACTCAAGACAAAGAGTTTGTTCGCTTGGCATTAAAC 
AGCTTCCAAG AAGTGTGGCTT CCG ATTAACCAGAACCTGGACTTGAT AG CATCTTCCT 
TCTGCCTCCAGCACTGTCCGTATTTGCGGAAAATTCGGGTGGATGTCAAAGGGATCTT 
CCCAAGAGATGAGTCCGCTGAGGCATGTCCTGTGGTCCCTCTATGGATGCGGGATAAG 
ACCCTCATTGAGGAGCAGTGGGAAGATTTCTGCTCCATGCTTGGCACCCACCCACACC 
TGCGGCAGCTGGACCTGGG CAG CAGCATCCTGACAGAGCGGG CCATG AAG ACCCTGTG 
TGCCAAGCTGAGGCATCCCACCTGCAAGATACAGACCCTGATGTTTAGAAATGCACAG 
ATTACCCCTGGTGTG CAGC ACCT CTGGAGAATCGTCATGGC CAACCGTAACCTAAGAT 
CCCTCAACTTGGGAGGCACCCACCTGAAGGAAGAGGATGTAAGGATGGCGTGTGAAGC 
CTTAAAACACCCAAAATGTTTGTTGGAGTCTTTGAGGCTGGATTGCTGTGGATTGACC 
CATGCCTGTTACCTG AAG ATCTC C CAAATCCTT ACGACCTCC CCCAGCCTGAAAT CTC 
TGAGCCTGGCAGGAAACAAGGTGACAGACCAGGGAGTAATGCCTCTCAGTGATGCCTT 
GAGAGTCTCCCAGTGCGCCCTGCAGAAGCTGATACTGGAGGACTGTGGCATCACAGCC 
ACGGGTTGCC AGAGTCTGG CCTCAG C CCTCGTCAGCAACCGGAGCTTG ACAC AC CTGT 
GCCTATCCAACAACAGCCTGGGGAACGAAGGTGTAAATCTACTGTGTCGATCCATGAG 
GCTTCCCCACTGTAGTCTGCAGAGGCTGATGCTGAATCAGTGCCACCTGGACACGGCT 
GGCTGTGGTTTTCTTGCACTTGCGCTTATGGGTAACTCATGGCTGACGCACCTGAGCC 
TTAGCATGAAC CCTGTGGAAG ACAATGG CGTG AAGCTTCTGTGCG AGGTCATGAGAG A 
ACCATCTTGTCATCTCCAGGACCTGGAGTTGGTAAAGTGTCATCTCACCGCCGCGTGC 
TGTGAGAGT CTGTCCTGTGTGAT CTCG AGG AGCAG ACACCTG AAG AGCCTGGATCTCA 
CGGACAATGCCCTGGGTGACGGTGGGGTTGCTGCACTGTGCGAGGGACTGAAGCAAAA 
GAACAGTGTTCTGACGAGACTCGGGTTGAAGGCATGTGGACTGACTTCTGATTGCTGT 
GAGGCACTCTCCTTGGCCCTTTCCTGCAACCGGCATCTGACCAGTCTAAACCTGGTGC 
AGAATAACTTCAGTCCCAAAGGAATGATGAAGCTGTGTTCGGCCTTTGCCTGTCCCAC 
GTCTAACTTACAGATAATTGGGCTGTGGAAATGGCAGTACCCTGTGCAAATAAGGAAG 
CTGCTGGAGGAAGTGCAGCTACTCAAGCCCCGAGTCGTAATTGACGGTAGTTGGCATT 
CTTTTGATG AAG ATG AC CG GT ACTGG TGG AAAAA CT GAAG AT ACGG AAACCTGC C CCA 
CT CACACCCAT CTG ATGGAGGAACTTT AAACGCTGT 




ORF Start: ATG at 69 


ORF Stop: TGA at 3168 




SEQ ID NO: 348 


1033 aa 


MW at 116310.7kD 


NOV 125a, 

CG59991-01 Protein Sequence 


MG P PFSTRETSTLCE PKGRRLRPRQRRNQENVTKNS LKLPGPLQEQTGLGP PHLGSEL 
GLHGGDTWDYKSHVMTKFAEEEDVRRSFEhrTAAI)WPEMQTLAGAFDSDRWGFRPRTVV 
LHGKSGIGKSALARR I VLCWAQGGLYQGMFS YVFFLPVREMQRKKES S VTEF I SREWP 
DSQAPVTEIMSRPERLLFIIDGFDDLGSVLNNDTKLCKDWAEKQPPFTLIRSLLRKVL 
LPESFLIVTVRDVGTEKLKSEWSPRYLLVRGISGEQRIHLLLERGIGEHQKTQGLRA 
IMNNRELLDQCQVPAVGSLrCVALQLQDWGESVAPFNQTLTGLHAAFVFHQLTPRGV 
VRRCl^LEERWLKRFCRMAVEGVV^RKSVFDGDDLMVQGLGESELRALFHMNILLPD 
SHCEEYYTFFHLSLQDFCAALYYVLEGLEIEPALCPLYVEKTKRSMELKQAGFHIHSL 
WMKRFLFGLVSEDVRRPLEVLLGCPVPLGVKQKLLHWVSLLGQQPNATTPGDTLDAFH 
CLFETQDKEFVRLALNSFQEVWLPINQNLDLIASSFCLQHCPYLRKIRVDVKGIFPRD 
ESAEACPWPLWMRDKTLI EEQWEDFCSMLGTHPHLRQLDLGSS I LTERAWKTLCAKL 
RH PTC K I QT LM FRNAQ I TPG VQH LWR I VMANRNLRS LNLGGTH L KEED VRMAC EALKH 
PKCLLESLRLDCCGLTHACYLKISQILTTSPSLKSLSLAGNKVTDQGVMPLSDALRVS 
QCALQKL I LEDCG I T ATGCQS LAS ALVSNRS LTHLC LSNNS IX3N EG VNLLCRSMRL PH 
CS LQRLNLNQCHLDT AG CG FLALALMGNS WLTH LS LS MN P VEDNG VKLLC EVMRE PS C 

hlqdlelvkchltaacceslscvisrsrhlksldltdnaLgdggvaalceglkqknsv 
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LTRLGL KACGLTSDCCE ALS LALS CNRHLTS LNLVQNNFS P KGMMKLCS AFAC PT SNL 
QIIGLWKWQYPVQIRKLLEEVQLLKPRWIDGSWHSFDEDDRYWWKN 



Further analysis of the NOV125a protein yielded the following properties shown in 
Table 125B. 



Table 125B. Protein Sequence Properties NOV125a 


PSort 
analysis: 


0.7600 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV 125a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 125C. 



Table 125C. Geneseq Results for NOV125a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV125a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAE07514 


Human PYRIN- 1 protein - Homo 
sapiens, 1034 aa. [WO200161005-A2, 
23-AUG-2001] 


103..934 
207.. 1003 


276/843 (32%) 
445/843 (52%) 


e-126 


AAE07513 j 


Human nucleotide binding site 1 
(NBS-1) protein - Homo sapiens, 
1033 aa. [WO200161005-A2, 23- 
AUG-2001] 


114..935 
180..990 


281/839(33%) 
431/839(50%) 


e-120 


AAU07878 


Polypeptide sequence for mammalian 
Spg65 - Mammalia, 748 aa. 
[WO200166752-A2, 13-SEP-2001] 


207..963 
9..748 


218/766(28%) 
380/766 (49%) 


7e-95 


AAE06758 


Human G-protein coupled receptor-8 
(GCREC-8) protein - Homo sapiens, 
1473 aa. [WO200157085-A2, 09- 
AUG-2001] 


21. .764 
219..959 


235/772 (30%) 
380/772(48%) 


3e-88 


AAB62571 


Human CARD-7 polypeptide - Homo 
sapiens, 1429 aa. [WO200130813-A1, 
03-MAY-2001] 


21. .764 
219..959 


235/772 (30%) 
380/772 (48%) 


3e-88 
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In a BLAST search of public sequence databases, the NOV125a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 125D. 



Table 125D. Public BLASTP Results for NOV125a 


Protein 
Number 


Prntpin/Oro5ifii«fTi/T .Ptiath 

x ■ uiviu/V/l cauiaiiv Juciiciii 


NOV125a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9JLR2 


MATERNAL-ANTIGEN-THAT- 
EMBRYOS-REQUTRE PROTEIN - 
Mus musculus (Mouse), 1111 aa. 


24..1033 
104..1111 j 


548/1019(53%) 
716/1019(69%) 


0.0 


09R1M5 


MATFR PROTFTN - Mus musculus 
(Mouse), 1 1 1 1 aa. 


24 1033 
104..1111 


547/1019 (53%1 
716/1019(69%) 


0 0 


AAL35293 \ 


NALP4 - Homo sapiens (Human), 994 i 
aa. 


63. .958 
94..981 


291/907 (32%) 
473/907 (52%) 


e-133 


Q96MN2 \ 


CDNA FLJ32126 FIS, CLONE 
PEBLM20001 12, WEAKLY 
SIMILAR TO HOMO SAPIENS 
NUCLEOT IDE-BINDING SITE 
PROTEIN 1 MRNA - Homo sapiens j 
(Human), 919 aa. 


63..958 
19..906 


291/907 (32%) 
473/907 (52%) 


e-133 


AAL12497 


CRYOPYRIN - Homo sapiens 
(Human), 1034 aa. 


103..934 
207..1003 


276/843 (32%) 
445/843 (52%) 


e-125 



PFam analysis predicts that the NOV 125a protein contains the domains shown in the 
Table 125E. 



Table 125E. Domain Analysis of NO VI 25a 


Pfam Domain 


NOV 125a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


LRR: domain 1 of 6 


671..695 


6/25 (24%) 
16/25 (64%) 


1.6e+02 


LRR: domain 2 of 6 


728..752 


7/27 (26%) 
17/27 (63%) 


2.3e+02 


LRR: domain 3 of 6 


785..809 


7/26 (27%) 
19/26 (73%) 


1.6e+02 


LRR: domain 4 of 6 


814..836 




4.3e+02 
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14/25 (56%) 




LRR: domain 5 of 6 


899..923 


8/26(31%) 
20/26 (77%) 


27 


LRR: domain 6 of 6 


956..977 


7/25 (28%) 
16/25 (64%) 


2.9e+02 



Example 126. 

The NOV 126 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 126 A. 



Table 126A. NOV126 Sequence Analysis 




SEQ ID NO: 349 


2310 bp 


NOV126a, 

CG59987-01 DNA Sequence 


CCGCGCCTCAGTCCGCCGTCCGCCCTCCGCGCCCGCGCCGCTAGCATGACCGACGCGC 


TGTTG CCCGCGG CCCCCCAGCCGC TGG AGAAGG AG AACGACGG CT ACTTTCGG AAGGG 
CTGTAATCCCCTTGCACAAACCGGCCGGAGTAAATTGCAGAATCAAAGAGCTGCTTTG 
AATCAGCAGATC CTGAAAG CCGTGCGG ATGAGGAC CGG AGCGGAAAACCTTCTGAAAG 
TGGCCACAAACTCAAAGGTGCGGGAGCAAGTGCGGCTGGAGCTGAGCTTCGTCAACTC 
AGACCTGCAGATGCTCAAGGAAGAGCTGGAGGGGCTGAACATCTCGGTGGGCGTCTAT 
CAGAACACAGAGGAGGCATTTACGATTCCCCTGATTCCTCTTGGCCTGAAGGAAACGA 
AAGACGTCGACTTTGCAGTCGTCCTCAAGGATTTTATCCTGGAACATTACAGTGAAGA 
TGGCTATTTATATGAAGATGAAATTGCAGATCTTATGGATCTGAGACAAGCTTGTCGG 
ACGCCTAGCCGGGATGAGGCCGGGGTGGAACTGCTGATGACATACTTCATCCAGCTGG 
GCITTGTCGAGAGTCGATTCTTCCCGCCCACACGGCAGATGGGACTCCTGTTCACCTG 
GTATGACTCTCTCACCGGGGTTCCGGTCAG CCAGCAGAACCTG CTGCTGG AG AAGGCC 
AGTGTCCTGTTCAACACTGGGGCCCTCTACACCCAGATTGGGACCCGGTGTGATCGGC 
AG ACG CAGGCTGGG CTGGAGAGTGCCAT AG ATGCCTTTCAGAGAGCCGCAGGGGTTTT 
AAATTACCTGAAAGACACATTTACCCATACTCCAAGTTACGACATGAGGCCTGCCATG 
CTCAGCGTGCTCGTCAAAATGATGCTTGCACAAGCCCAAGAAAGCGTGTTTGAGAAAA 
TCAGCCTTCCTGGGATCCGGAATGAATTCTrCATGCTGGTGAAGGTGGCTCAGGAGGC 
TGCTAAGGTGGGAG AGGTCTAC CAACAGCT ACACG C AGCCATG AG CCAGGCGC CGGTG 
AAAGAGAACATCCCCTACT CCTGGGCCAG CTTAGCCTGCGTGAAGGCCCACCACTACG 
CGGCCCTGGCCCACTACTT CACTGCCATCCTCCTCATCGACCACCAGGTGAAG CCAGG 
CACGGATCTGGACCACCAGGAGAAGTGCCTGTCCCAGCTCTACGACCACATGCCAGAG 
GGGCTGACACCCTTGGCCACACTGAAGAATGATCAGCAGCGCCGACAGCTGGGGAAGT 
CCCACTTGCGCAGAGCCATGGCTCATCACGAGGAGTCGGTGCGGGAGGCCAGCCTCTG 
CAAGAAGCTGCGGAGCATTGAGGTGCTACAGAAGGTGCTGTGTGCCGCACAGGAACGC 
TCCCGGCTCACGTACGCCCAGCACCAGGAGGAGGATGACCTGCTGAACCTGATCGACG 
C CCCCAGAGTGTTGTTG CT AAAACTG AGCAAG AGGTTG ACATT ATATTG CCCC AGTTC 
T CC AGCTG ACAGTCACGG ACTTCTTCCAG AAGCTGGGCCCTTATCTGTGCTGT CGG CT 
AACAAGCGGTGGACGCCTCCTCGAAGCATCCGCTTCACTGCAGAAGAAGGGGACTTGG 
GGTTCACCTTGAGAGGGAACX3CCCCCGTTCAGGTTCACTTCCTGGATCCTTACTGCTC 
TG C CTCGGTGG CAGG AGCCCGGG AAGG AG ATT AT ATTGTCTCCATTC AG CTTGTGG AT 
TGT AAGTGG CTGACG CTGAGTG AGGTTATGAAGCTGCTG AAGAGCTTTGGCGAGG ACG 
AGATCGAGATGAAAGTCGTGAGCCTCCTGGACTCCACATCATCCATGCATAATAAGAG 
TGCCACATACTCCGTGGGAATGCAGAAAACGTACTCCATGATCTGCTTAGCCATTGAT 
GATGACGACAAAACTGATAAAACCAAGAAAATCTCCAAGAAGCTTTCCTTCCTGAGTT 
GGGGCACCAACAAGAACAGACAGAAGTCAGCCAGCACCTTGTGCCTCCCATCGGTCGG 
GGCTGCACGGCCTCAGGTCAAGAAGAAGCTGCCCTCCCCTTTCAGCCTTCTCAACTCA 
GAC AGTTCTTGGTACT AATGTGAGGAAACAAACATGTTCAGGC CCCGAACATTTC CGG 


TGCTGACTCGGCCTTAAACGTTTGTGCCATAATGGAAAATATCTATCTATCTGTTCTC 


AAAT C CTG TTT TT CT CAT AG TGT AAACT C ACATTTG ATG TG TTTTT ATG AAGG AAAG T 


AACCAAGAAACCTCTAGGAATTAGTGAAAAAAGAACTTTTTTGAGGTG 




ORF Start; ATG at 46 


ORF Stop: TAA at 2104 




SEQ ID NO: 350 


686 aa MW at 76812.3kD 


NOV126a, 

CG59987-01 Protein Sequence 


MTDALLPAAPQPLEKENtXSYFRKGCNPLAQTGRSKLQNQRAALNQQILKAVRNRTGAE 
NLLKVATNSKVREQVRLELSFVNSDLQMLKEELEGLNISVGVYQNTEEAFTIPLIPLG 
LKETKDVDFAWLKDF I LEHYS EDGYLYEDE I ADLMDLRQACRTPS RDEAG VELLMTY 
FIQLGFVESRFFPPTRQMGLLFTVryDSLTGVPVSQQNLLLEKASVLFNTGALYTQIGT 
RCDRQTQAGLE S AIDAFQRAAGVLNYLKDTFTHTPS YDMS PAMLSVLVKMMLAQAQES 
VFEKI SLPGIRNEFFMLVKVAQEAAKVGEVYQQLHAAMSQAPVKENI PYSWASIiACVK 
AHHYAALAHYFTAI LLI DHQVKPGTDLDHQE KCLSQLYDHM PEG LT PLAT LKNDQQRR 
QLGKSHLRRAMAHHEESVREASLCKKLRSIEVLQKVtiCAAQERSRLTYAQHQEEDDLL 
N L I DA PR VLLLKLS KRLTLYC PS S P ADSHG LLPEAGPLS VLS AN KRWT P P RS I R FT AE 
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EGDLGFTLRGNAP VQVHFLD P YCS AS VAGAREGDY I VS I QLVDCKMLTLSEVMKLLKS 
FGEDEIEMKVVSLLDSTSSMHNKSATYSVGMQKTYSMICLAIDDDDKTDKTKKISKKL 
SFLSWGTNKNRQKSASTLCLPSVGAARPQVKKKLPSPFSLLNSDSSWY 




SEQIDNO:351 


2109 bp 


NOV126b, 

CG59987-02 DNA Sequence 


CGCCGCTAGCATGACCGACGCGCTGTTGCCCGCGGCCCCCCAGCCGCTGGAGAAGGAG 
AACGACGGCTACTTTCGG AAGGG CTGT AATCCCCTTG C ACAAAC CGG CCGG AGTAAAT 
TG CAGAATCAAAG AGCTGCTTTG AATCAGCAGATCCTG AAAG CCGTGCGG ATGAGG AC 
CGGAGCGGAAAACCTTCTGAAAGTGGCCACAAACTCAAAGGTGCGGGAGCAAGTGCGG 
CTGGAGCTGAGCTTCGTCAACTCAGACCTGCAGATGCTCAAGGAAGAGCTGGAGGGGC 
TGAACATCTCGGTGGGCGTCTATCAGAACACAGAGGAGGCATTTACGATTCCCCTGAT 
TCCTCTTGGCCTGAAGGAAACGAAAGACGTCGACTTTGCAGTCGTCCTCAAGGATTTT 
ATCCTGGAACATTACAGTGAAGATGGCTATTTATATGAAGATGAAATTGCAGATCTTA 
TGGATCTG AG ACAAGCTTGTCGGACGCCT AG CCGGGATGAGG CCGGGGTGGAACTGCT 
GATGACATACTTCATCCAGCTGGGCTTTGTCGAGAGTCGATTCTTCCCGCCCACACGG 
CAGATGGGACTCCTGTTCACCTGGTATGACTCTCTCACCGGGGTTCCGGTCAGCCAGC 
AGAACCTG CTGCTGGAG AAGG CCAGTGTCCTGTTCAACACTGGGGCCCTCTACACCCA 
GATTGGGACCCGGTGCGATCGGCAGACGCAGGCTGGGCTGGAGAGTGCCATAGATGCC 
TTTCAGAGAGCCGCAGGGGTTTTAAATTACCTGAAAGACACATTTACCCATACTCCAA 
GTTACGACATGAGCCCTGCCATGCTCAGCGTGCTCGTCAAAATGATGCTTGCACAAGC 
CCAAGAAAGCGTGTTTGAGAAAATCAGCCTTCCTGGGATCCGGAATGAATTCTTCATG 
CTGGTG AAGGTGGCTC AGGAGG CTGCTAAGGTGGGAGAGGTCT ACCAACAG CT ACACG 
CAGCCATGAGCCAGGCGCCGGTGAAAGAGAACATCCCCTACTCCTGGGCCAGCTTAGC 
CTGCGTGAAGGCCCACCACTACGCGGCCCTGGCCCACTACTTCACTGCCATCCTCCTC 
ATCGACCACCAGGTGAAGCCAGGCACGGATCTGGACCACCAGGAGAAGTGCCTGTCCC 
AG CT CT ACG AC CACATGCCAGAGGGGCTGACACCCTTGG CCACACTGAAG AATG AT CA 
GCAGCGCCG AC AGCTGGGGAAGTCCCACTTGCGCAGAG CCATGG CT CATCACGAGG AG 
TCGGTGCGGGAGGCAAGCCTCTGCAAGAAGCTGCGGAGCATTGAGGTGCTACAGAAGG 
TGCTGTGTGCCGCACAGGAACGCTCCCGGCTCACGTACGCCCAGCACCAGGAGGAGGA 
TGACCTG CTG AACCTGATCG ACGCCCCCAGTGTTGTTG CT AAAACTG AGCAAGAGGTT 
GACATTATATTGCCCCAGTTCTCCAAGCTGACAGTCACGGACTTCTTCCAGAAGCTGG 
G CC C CTT AT CTGTGTTTT CGG CTAACAAG CGG TG G ACG C CT CCT CG AAG CAT C CG CT T 
CACTGCAGAAGAAGGGGACTTGGGGTTCACCTTGAGAGGGAACGCCCCCGTTCAGGTT 
CACTTCCTGGATCCTTACTGCTCTGCCTCGGTGGCAGGAGCCCGGGAAGGAGATTATA 
TTGTCTCCATTCAGCTTGTGGATTGTAAGTGGCTGACGCTGAGTGAGGTTATGAAGCT 
GCTGAAGAG CTTTGGCGAGGACGAGATCG AG ATG AAAGTCGTG AGCCTC CTGGACTCC 
ACATCATCCATGCATAATAAGAGTGCCACATACTCCGTGGGAATGTAQAAAACGTACT 
CCATG ATCTG CTT AG CCATTG ATG ATGACG ACAAAACTGATAAAACCAAG AAAATCTC 


CAAGAAGCTTTCCTTCCTGAGTTGGGGCACCAACAAGAACAGACAGAAGTCAGCCAGC 


ACCTTGTGCCTCCCATCGGTCGGGGCTGCACGGCCTCAGGTCAAGAAGAAGCTGCCCT 


CCCCTTTCAGCCTTCTCAACTCAGACAGTTCTTGGTACTAATGTGAGGAAACAAACAT 


GTTCAGGCCCCGAACATTTCC 




ORF Start: ATG at 11 


ORF Stop: TAG at 1844 




SEQ ID NO: 352 


611 aa 


MWat68613.9kD 


NOV126b, 

CG59987-02 Protein Sequence 


MTI)ALLPAAPQPLEKENDGYFRKGCNPLAQTGRSKLQNQRAALNQQILKAVRMRTGAE 
NLLKVATNS KVREQ VRLE LS FVNS DLQMLKE E LEG LN I S VG VYQNT EEAFTIPLIPLG 
LKETKDVDFAWLKDFI LEHYSEDGYLYEDE IADLMDLRQACRTPSRDEAGVELLMTY 
FIQLGFVESRFFPPTRQMGLLFTWYDSLTGVPVSQQNLLLEKASVLFNTGALYTQIGT 
RCDRQTQAGLESAIDAFQRAAGVLNYLKDTFTHTPSYDMSPAMLSVLVKMMLAQAQES 
VFEKI SLPG I RNEFFMLVKVAQEAAKVGEVYQQLHAAMSQAP VKENI PYS WASLACVK 
AHHYAALAHYFT A I LLI DHQVKPGTDLDHQ EKC LS QL YDHMPEGLT PLATLKNDQQRR 
QLGKSHIjRRAMAHHE E S VREAS LCKKLRS I E VLQKVLCAAQE RS RLT YAQHQE EDDLL 
NLIDAPSWAKTEQEVDIILPQFSKLTVTDFFQKLGPLSVFSANKRWTPPRSIRFTAE 
EGDLGFTLRGNAPVQVHFLDP YCSAS VAGAREGDY I VS I QLVDCKWLTLS EVMKLLKS 
FGEDEI EMKWSLLDSTSSMHNKSATYSVGM 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 126B. 
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Table 126B. Comparison of NO VI 26a against NO VI 26b. 


Protein Sequence 


NO VI 26a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV126b 


1 ..61 1 
1.61 1 


585/612 (95%) 
590/612 (95%) 



Further analysis of the NOV 126a protein yielded the following properties shown in 
Table 126C. 



Table 126C. Protein Sequence Properties NOV126a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV 126a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 126D. 



Table 126D. Geneseq Results for NO\ 126a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV126a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAU10192 


Human prostate specific protein 
PSL22 - Homo sapiens, 686 aa. 
[WO200172962-A2, 04-OCT-2001] ; 


1..686 
1..686 


660/687(96%) \ 
665/687(96%) ; 


0.0 


AAB68561 


Human GTP-binding associated 
protein #61 - Homo sapiens, 666 aa. 
[WO200105970-A2, 25-JAN-2001] 


27..686 
7..666 


626/661 (94%) ! 
633/661 (95%) 


0.0 


AAG64579 


Human transcription termination 
factor binding protein 54 - Homo 
sapiens, 488 aa. [CN1297918-A, 06- 
JUN-2001] 


201..686 
3..488 


458/487 (94%) 
464/487(95%) 


0.0 


AAB29661 


Human histidine domain-protein 
tyrosine phosphatase, SEQ ID NO:2 - 


110..357 
7..253 


82/252 (32%) 
135/252 (53%) 


3e-28 
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[WO200063392-A1, 26-OCT-2000] 








AAU00869 


Human cancer related protein 5 - 
Homo sapiens, 257 aa. 
[WO200118014-A1, 15-MAR-2001] 


409..597 
8..196 


70/189(37%) 
102/189(53%) 


2e-27 



In a BLAST search of public sequence databases, the NOV 126a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 126E. 



Table 126E. Public BLASTP Results for NOV126a 


Protein 
Number 


Protein/Orpanism/Lenpth 


NOV126a 
Residues/ 

Match 
Residues 


Identities/ 
Similarities for . 
the Matched 
Portion 


Expect 
Value 


Q96RU1 


RHOPHILINtLIKE protein - 
Homo sapiens (Human), 685 aa. 


1..686 
1..685 


627/688 (91%) 
640/688 (92%) 


0.0 


Q9DBN2 


1300002E07RIK PROTEIN - Mus 
musculus (Mouse), 686 aa. 


1.686 
1..686 


573/687 (83%) 
616/687 (89%) 


0.0 


Q61085 


GTP-RHO binding protein 1 
(Rhophilin) - Mus musculus 
(Mouse), 643 aa. 


16..596 
20..580 


273/583 (46%) 
361/583 (61%) 


e-135 


Q9XYY9 


RHOPHILIN - Drosophila 
melanogaster (Fruit fly), 718 aa. 


21..615 
31. .674 


248/654 (37%) 
363/654 (54%) 


e-110 


Q96PV9 


KIAA1929 PROTEIN - Homo 
sapiens (Human), 410 aa 
(fragment). 


23..366 
17..362 


178/346 (51%) 
241/346 (69%) 


le-93 



PFam analysis predicts that the NOV 126a protein contains the domains shown in the 
Table 126F. 



Table 1 26F. Domain Analysis of NOV126a 


Pfam Domain 


NOV126a Match Region 


Identities/ 
Similarities 
for the Matched Region 


Expect Value 


HR1: domain 1 of 1 


38..UO 


19/87(22%) i 
53/87 (61%) 


1.2e-05 


BROl: domain 1 of 1 


111. .263 


60/172 (35%) 
125/172 (73%) 


3.8e-56 


PDZ: domain 1 of 1 


516..593 


20/84 (24%) ! 
53/84 (63%) 


0.46 



473 



WO 02/072757 PCT/US02/06908 
Example 127. 

The NOV127 clone was analyzed, and the nucleotide and predicted polypeptide 
sequences are shown in Table 127 A. 



Table 127A. NOV127 Sequence Analysis 




SEQIDNO:353 


3351 bp 


NOV 127a, 

CG59971-01 DNA Sequence 


CGTCCCGTGGCCATGACGACCGCTCAGAGGGACTCCCTGTTGTGGAAGCTCGCGGGGT 
TGCTGCGGGAGTCCGGTGATGTGGTCCTGTCTGGCTGTAGCACCCTGAGCCTGCTGAC 
TCCCACACTGCAACAGCTGAACCACGTATTTGAGCTGCACCTGGGGCCATGGGGCCCT 
GG CCAGACAGGCTTTGTGGCTCTGCCCTCCCATC CTG CCGACTC CCCTGTTATTCTTC 
AGCTTCAGTTTCTCTTCGATGTGCTGCAGAAAACACTTTCACTCAAGCTGGTCCATGT 
TGCTGGTCCTGGCCCCACAGGGCCCATCAAGATTTTCCCCTTCAAATCCCTTCGGCAC 
CTGGAGCTCCGAGGTGTTCCCCTCCACTGTCTGCATGGCCTCCGAGGCATCTACTCCC 
AGCTGGAGACCCTGATTTGCAGCAGGAGCCTCCAGGCATTAGAGGAGCTCCTCTCAGC 
CTGCGGCGGCGACTTCTGCTCTGCCCTCCCTTGGCTGGCTCTGCTTTCTGCCAACTTC 
AGCTACAATGCACTGACCGCCTTAGACAGCTCCCTGCGCCTCTTGTCAGCTCTGCGTT 
TCTTGAACCTAAGCCACAATCAAGTCCAGGACTGTC^GGGATTCCTGATGGATTTGTG 
TGAGCTCCACCATCTGGACATCTCCTATAATCGCCTGCATTTGGTGCCAAGAATGGGA 
CCCTCAGGGGCTGCTCTGGGGGTCCTGATACTGCGAGGCAATGAGCTTCGGAGCCTGC 
CAGGCCT AG AGCAG CTG AGGAATCTG CGG CACCTGG ATTT GG CAT ACAAC CTG CTGG A 
AGGACACCGGGAGCTGTCACCACTGTGGCTGCTGGCTGAGCTCCGCAAGCTCTACCTG 
G AGGGGAACCCTCTTTGGTTC CACCCTGAGCACCGAG CAG CCACTGCCCAGTACTTGT 
CACCCCGGGCCAGGGATGCTGCTACTGGCTTCCTTCTCGATGGCAAGGTCTTGTCACT 
GACAGATTTTCAGCAGACTCACACATCCTTGGGGCTCAGCCCCATGGGCCCACCTTTG 
CCCTGGCCAGTGGGG AGT ACTCCTG AAACCTCAGGTGG CC CTG ACCTGAGTGACAG CC 
TCTCCTCAGGGGGTGTTGTGACCCAGC CCCTGCTTCAT AAGGTTAAGAGCCG AGTC CG 
TGTGAGGCGGGCAAGCATCTCTGAACCCAGTGATACGGACCCGGAGCCCCGAACTCTG 
AACCCCTCTCCGGCTGGTTGGTTCGTGCAGCAGCACCCGGAGCTGGAGCTCATGAGCA 
GCTTCCGGGAACGGTTCGGCCGCAACTGGCTGCAGTACAGGAGTCACCTGGAGCCCTC 
CGGAAACCCTCTGCCGGCCACCCCCACTACTTCTGCACCCAGTGCACCTCCAGCCAGC 
TCCCAGGGCCCCGACACTGCACCCAGACCTTCACCCCCGCAGGAGGAAGCCAGAGGCC 
CCCAGGAGTCACCACAGAAAATGTCAGAGGAGGTCAGGGCGGAGCCACAGGAGGAGGA 
AGAGGAGAAGGAGGGGAAGGAGGAGAAGGAGGAGGGGGAGATGGTGGAACAGGGAGAA 
G AGG AGG CAGGAGAGG AGGAAG AAGAGGAGCAGGACCAG AAGGAAGTGGAAG CGG AAC 
TCTGTCGCCCCTTCTTGGTGTGTCCCCTGGAGGGGCCTGAGGGCGTACGGGGCAGGGA 
ATGCTTTCTCAGGGTCACTTCTGCCCACCTGTTTGAGGTGGAACTCCAAGCAGCTCGC 
ACCTTGGAGCGACTGG AG CTCCAGAGTCTGGAGGCAGCTGAGATAGAG CCGG AGG CCC 
AGGCCCAGGGTCCCCCTCTTGCTGCGCAGGGCTCAGATCTGCTCCCTGGAGCCCCCAT 
CCTCAGTCTGCGCTTCTCCTACATCTGCCCTGACCGGCAGTTGCGTCGCTATTTGGTG 
CTGG AGCCTGATGCCCACGCAGCTGTCCAGGAGCTGCTTGCCGTGTTGAC CC CAGTCA 
CCAATGTGGCTCGGGAACAGCTTGGGGAGGCCAGGGACCTCCTGCTGGGTAGATTCCA 
GTGTCTACGCTGTGGCCATGAGTTCAAGCCAGAGGAGCCCAGGATGGGATTAGACAGT 
GAGGAAGGCTGGAGGCCTCTGTTCCAAAAGACAGAATCTCCTGCTGTGTGTCCTAACT 
GTGGTAGTGACCACGTGGTTCTCCTCGCTGTGTCTCGGGGAACCCCCAACAGGGAGCG 

CCTGGC CATGGTG ACCACCTTGACAGGGC CAAGAACAGCCCACCTCAGGCACCGAGCA 

CCCGTGACCATGGTAGTTGGAGCCTCAGTCCCGCCCCTGAGCGCTGTGGCCTCCGCTC 

TGTGGACCACCGACTCCGGCTCTTCCTGGATGTTGAGGTGTTCAGCGATGCCCAGGAG 

GAGTTCCAGTGCTGCCTCAAG^TCCCAGTGGCATTGGCAGGCCACAC^ 

TGTGCCTTGTGGTTGTGTCTGACCGCAGGCTGTACCTGTTGAAGGTGACTGGGGAGAT 

GAGTGAGCCTCCAGCTAGCTGGCTGCAGCTGACCCTGGCTGTTCCCCTGCAGGATCTG 

AGTGGCATAGAGCTGGGCCTGGCAGGCCAGAGCCTGCGGCTAGAGTGGGCAGCTGGGG 

CGGG CCGCTGTGTGCTG CTGCCCCG AG ATGCCAGG CATTGCCGGGCCTTCCTAGAGG A 

GCTCCTTGGTGTCTTGCAGTCTCTGCCCCCTGCCTGGAGGAACTGTGTCAGTGCCACA 

GAGGAGGAGGTCACCCCCCAGCACCGGCTCTGGCCATTGCTGGAAAAAGACTCATCCT 

TGGAGGCTCGCCAGTTCTTCTACCTTCGGGCGTTCCTGGTTGAAGGTGAAGCCTCTGT 

GCAGCTGATGCTTCCCTCCACCTGCCTCGTATCCCTGTTGCTGACTCCGTCCACCCTG 

TTC CTGTTAGATGAGGATGCTGCAGGGTCCCCGGC AGAG CCCTCTCCTCCAGCAG CAT 

CTGGCGAAGCCTCTGAGAAGGTGCCTCCCTCGGGGCCGGGCCCTGCTGTGCGTGTCAG 

GG AGCAGCAGCCACT CAG CAG C CTG AG CTCCGTGCTG CTCTACCGCTCAGCCC CTG AG 

GACTTG CGGCTGCTCTTCT ACG ATGAGGTGTCC CGG CTGGAGAGCTTTTGGGCACT C C 

GTGTGGTGTGTCAGGAGCAGCTGACAGCCCTGCTTGCCTGGATCCGGGAACCATGGGA 

GGAG CTGTTTTCCATCGG ACTCCGGACAGTG ATCC AAG AGGCG CTGGCCCTTGACCGA 

TGAGGGTCCCACGCTGACCTTGGCCCTGACCTCAGGAGCCACGCT 




ORF Start: ATG at 13 


ORF Stop: TGA at 3307 




SEQ ID NO: 354 


1098 aa 


MWat 121004.1kD 


NOV 127a, 


MTTAQRDS LLWKLAGLLRESGD WLSGCSTLSLLT PTIjQQLNHVFELHLG P WG PGQTG 
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CG59971-01 Protein Sequence 


FVALPSHPADSPVIliQLQFLFDVLQKTLSLKLVHVAGPGPTGPIKIFPFKSLRHLELR 
GVPLHCLHGLRGIYSQLETLICSRSLQALEELLSACGGDFCSALPWLALLSANFSYNA 
LTALDSSLRLLSALRFLNLSHNQVQDCQGFLMDLCELHHLDISYNRLHLVPRMGPSGA 
ALGVLILRGNELRSLPGLEQLRNLRHLDLAYNLLEGHRELSPLWLLAELRKLYLEGNP 
LWFHPEHRAATAQ YLS PRARD AATG FLLDGKVLS LTD FQQTHTS LG LS PMG P PLPWP V 
GSTPETSGGPDLSDSLSSGGWTQPLLHKVKSRVRVRRASISEPSDTDPEPRTLNPSP 
AGWFVQQHPELELMSSFRERFGRNWLQYRSHLEPSGNPLPATPTTSAPSAPPASSQGP 
DTAPRPSPPQEEARGPQESPQKMSEEVRAEPQEEEEEKEGKEEKEEGEMVEQGEEEAG 
EEEEEEQDQKEVEAELCRPLLVCPLEGPEGVRGRECFLRVTSAHLFEVELQAARTLER 
LEI/QSLEAAEIEPEAQAQGPPLAAQGSDLLPGAPILSLRFSYICPDRQLRRYLVLEPD 
AHAAVQE LLAVLT P VTNVAREQ LGEARDLLLGR FQCLRCGHE FKPEE PRMG LDS EEGW 
RPLFQKTESPAVCPNCGSDHWLLAVSRGTPNRERKQGEQSLAPSPSASPVCHPPGHG 
DHLDRAKNSPPQAPSTRDHGSWSLSPAPERCGLRSVDHRLRLFLDVEVFSDAQEEFQC 
CLKVPVALAGHTGEFMCLVVVSDRRLYLLKVTGEMSEPPASWLQLTIiAVPLQDLSGIE 
LGLAGQS LRLEWAAGAGRCVLLPRDARHCRAFLEELLGVLQS LP PAWRNCVS ATEE E V 
T PQHRLW PLLEKDS S LEARQ F F YLRAFLVEG EASVQLMLPS TCLVS LLLTPSTLFLLD 
ED AAG S P AEP S P PAASGEAS E KVPPSG PG PA VR VREQQP LS S LS S VLL YRS A P ED LRL 
LFYDE VSRLE S FWALRWCQEQLTALLAWI REPWEELFS IGLRTVIQEALALDR 




SEQ ID NO: 355 


3348 bp 


NOV127b, 

CG59971-02 DNA Sequence 


CGTCCCGTGGCCATGACGACCGCTCAGAGGGACTCCCTGTTGTGGAAGCTCGCGGGGT 
TGCTG CGGG AGTCCGGTGATGTGGTCCTGTCTGGCTGTAGCACCCTGAGCCTG CTG AC 
TCCCACACTGCAACAGCTGAACCACGTATTTGAGCTGCACCTGGGGCCATGGGGCCCT 
GGCCAGACAGGCTTTGTGGCTCTGCCCTCCCATCCTGCCGACTCCCCTGTTATTCTTC 
AGCTTCAGTTTCTCTTCGATGTGCTGCAGAAAACACTTTCACTCAAGCTGGTCCATGT 
TGCTGGTCCTGGCCCCACAGGGCCCATCAAGATTTTCCCCTTCAAATCCCTTCGGCAC 
CTGGAGCTCCGAGGTGTTCCCCTCCACTGTCTGCATGGCCTCCGAGGCATCTACTCCC 
AGCTGGAGACCCTGATTTGCAGCAGGAGCCTCCAGGCATTAGAGGAGCTCCTCTCAGC 
CTGCGGCGGCGACTTCTGCTCTGCCCTCCCTTGGCTGGCTCTGCTTTCTGCCAACTTC 
AGCTACAATGCACTGACCGCCTTAGACAGCTCCCTGCGCCTCTTGTCAGCTCTGCGTT 
TCTTGAACCTAAGCCACAATCAAGTCCAGGACTGTCAGGGATTCCTGATGGATTTGTG 
TGAGCTC CACCATCTGGACATCT CCT AT AATCGCCTGCATTTGGTGCCAAGAATGGG A 
CCCTCAGGGGCTGCTCTGGGGGTCCTGATACTG CGAGGC AATGAGCTTCGGAG CCTG C 
CAGG C CT AG AG CAGCTG AGG AATCTG CGG CAC CTGG ATTTGGC AT ACAACCTG CTGG A 
AGGACACCGGGAGCTGTCACCACTGTGGCTGCTGGCTGAGCTCCGCAAGCTCTACCTG 
GAGGGGAACCCTCTTTGGTTCCACCCTGAGCACCGAGCAGCCACTGCCCAGTACTTGT 
CACCCCGGGCCAGGGATGCTGCTACTGGCTTCCTTCTCGATGGCAAGGTCTTGTCACT 
GACAGATTTTCAGCAGACTCACACATCCTTGGGGCTCAGCCCCATGGGCCCACCTTTG 
CCCTGG GCAGTGGGGAGTACTCCTGAAACCTCAGGTGG CCCTGACCTGAGTGACAGCC 
TCTCCTCAGGGGGTGTTGTGACCCAGCCCCTGCTTCATAAGGTTAAGAGCCGAGTCCG 
TGTGAGGCGGGCAAGCATCTCTGAACCCAGTGATACGGACCCGGAGCCCCGAACTCTG 
AACCCCTCTCCGGCTGGTTGGTTCGTGCAGCAGCACCCGGAGCTGGAGCTCATGAGCA 
GCTTCCGGGAACGGTTCGGCCGCAACTGGCTGCAGTACAGGAGTCACCTGGAGCCCTC 
CGGAAACCCTCTGCCGGCCACCCCCACTACTTCTGCACCCAGTGCACCTCCAGCCAGC 
TCCCAGGGCCCCGACACTGCACCCAGACCTTCACCCCCGCAGGAGGAAGCCAGAGGCC 
CC CAGG AGTCAC CAC AG AAAATGT CAG AGG AGGTCAGGG CGG AG C CACAG G AGG AGG A 
AGAGGAGAAGGAGGGGAAGGAGGAGAAGGAGGAGGGGGAGATGGTGGAACAGGGAGAA 
GAGGAGGCAGGAGAGGAGGAAGAAGAGGAGCAGGACCAGAAGGAAGTGGAAGCGGAAC 
TCTGTCGCCCCTTGTTGGTGTGTCCCCTGGAGGGGCCTGAGGGCGTACGGGGCAGGGA 

A 1 vjL ill ^TUAWitj lA-A(- Hk. 1\jH_L.AUL. I\j ill (j Alatj I\j(jAA(_ 1 LUinuUibLT(.(jL 

ACCTTGG AGCGACTGG AG CTC CAG AGTCTGG AGGC AG CTGAG AT AG AGCCGGAGGCCC 

AGGCCCAGAGGTCGCCCAGGCCCACpGGCTCAGATCTGCTCCCTGGAGCCCCCATCCT 

CAGTCTGCGCTTCTCCTACATCTGCCCTGACCGGCAGTTGCGTCGCTATTTGGTGCTG 

GAGCCTGATGCCCA(XCAGCTGTCCAG^AGCTGCTTGCCGTGTTGACCCCAGTCACCA 

ATGTGGCTCGGGAACAGCTTGGGGAGGCC^GGGACCTCCTGCTGGGTAGATTCCAGTG 

TCTACGCTGTGGCCATGAGTTCAAGCCAGAGGAGCCCAGGATGGGATTAGACAGTGAG 

GAAGGCTGGAGGCCTCTGTTCCAAAAGACAGAATCTCCTGCTGTGTGTCCTAACTGTG 

GTAGTGACCACGTGGTTCTCCTCGCTGTGTCTCGGGGAACCCCCAACAGGGAGCGGAA 

ACAGGGAGAGCAGTCTCTGGCTCCTTCTCCGTCTGCCAGCCCTGTCTGCCACCCTCCT 

GGCCATGGTGACCACCTTGACAGGGCCAAGAACAGCCCACCTCAGGCACCGAGCACCC 

GTGACCATGGTAGTTGGAGCCTCAGTCCCGCCCCTGAGCGCTGTGGCCTCCGCTCTGT 

GGACCACCGACTCCGGCTCTTCCTGGATGTTGAGGTGTTCAGCGATGCCCAGGAGGAG 

TTCCAGTGCTGCCTCAAGGTCCCAGTGGCATTGGCAGGCCACACTGGGGAGTTCATGT 

GCCTTGTGGTTGTGTCTGACCGCAGGCTGTACCTGTTGAAGGTGACTGGGGA 

TGAGCCTCCAGCTAGCTGGCTGCAGCTGACCCTGGCTGTTCCCCTGCAGGATCTGAGT 

GGCATAGAGCTGGGCCTGGCAGGCCAGAGCCTGCGGCTAGAGTGGGCAGCTGGGGCGG 

GCCGCTGTGTGCTGGTG CCCCGAG ATG CCAGG CATTGCCGGG CCTTCCTAGAGG AGCT 

CCTTGGTGTCTTGCAGTCTCTGCCCCCTGCCTGGAGGAACTGTGTCAGTGCCACAGAG 

GAGGAGGTCACCCCCCAGCACCGGCTCTGGCCATTGCTGGAAAAAGACTCATCCTTGG 

AGGCTCGCCAGTTCTTCTACCTTCGGGCGTTCCTGGTTGAAGGTGAAGCCTCTGTGCA 

GCTGATGCTTCCCTCCACCTGCCTCGTATCCCTGTTGCTGACTCCGTCCACCCTGTTC 

CTGTT AG ATG AGGATG CTGCAGGGT CCCCGGCAG AGCCCTCT CCTCCAGCAGCATCTG 

GCGAAGCCT'CTGAGAAGGTGCCTCCCTCGGGGCCGGGCCCTGCTGTGCGTGTCAGGGA 

G CAGCAGCCACTCAGCAG CCTGAGCTCCGTGCTGCTCTACCG CTCAGCCCCTGAGGAC 

TTX3CGGCTGCTCTTCTACGATGAGGTGTCCCGGCTGGAGAGCTTTTGGGCACTCCGTG 

TGGTGTGTCAGGAGCAGCTGACAGCCCTGCTTGCCTGGATCCGGGAACCATGGGAGGA 
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GCTGTTTTCCATCGGACTCCGGACAGTGATCCAAGAGGCGCTGGCCCTTGACCGATGA 
GGGTCCCACGCTGACCTTGGCCCTGACCTCAGGAGCCACGCT 




ORF Start: ATG at 13 


ORF Stop: TGA at 3304 




SEQ ID NO: 356 


1097 aa 


MWatl21064.1kD 


NOV127b, 

CG59971-02 Protein Sequence 


MTTAQIU5SLLWKLAGLLRESGDWLSGCSTLSLLTPTLQQLNHVFELHLGPWGPGQTG 
FVALPSHPADSPVILQLQFLFDVLQKTLSLKLVHVAGPGPTGPIKIFPFKSLRHLELR 
GVPLHCLHGLRGIYSQLETLICSRSLQALEELLSACGGDFCSALPWLALLSANFSYNA 
LTAl^SSLRLLSAIiRFLNLSHNQVQDCQGFLMDIXTELHHLDISYNRLHLVPRMGPSGA 
ALX3VLILRGNELRSLPGLEQLRNLRHLDLAYNLLEGHRELSPLWLLAELRKLYLEGNP 
LWFHPEHRAATAQYLS PRARDAATG FLLDGKVLSLTDFQQTHTS IX5LS PMG P PLPWPV 
GST PETS GGPDljSDSIjSSGGVVTQrijijMKVI^KVRVRKAb 

AGWFVQQHPELELMSSFRERFGRNWLQYRSHLEPSGNPLPATPTTSAPSAPPASSQGP 
DTAPRPSPPQEEARGPQESPQKMSEEVRAEPQEEEEEKEGKEEKEEGEMVEQGEEEAG 
E E EEE EQDQKE VEAELCR PLL VCP LEGP EG VRGRECFLRVTS AH LF E VELQAARTLER 
LELQSLEAAEIEPEAQAQRSPRPTGSDLLPGAPILSLRFSYICPDRQLRRYLVLEPDA 
HAAVQELLAVLTPVTOVAREQLGEARDLLLGRFQCLRCGHEFKPEEPRMGLDSEEGWR 
PLFQKTESPAVCPNCGSDHWLXiAVSRGTPNRERKQGEQSLAPSPSASPVCHPPGHGD 
HLDRAKNSPPQAPSTRDHGSWSLSPAPERCGLRSVDHRLRLFLDVEVFSDAQEEFQCC 
LFCVPVAliAGHTGEFMCLVWSDRRLYLLKVTGEMS E P PASWLQLTLAVPLQDLSG I E L 
GLAGQSLRLEWAAGAGRCVLLPRDARHCRAFLEELLGVLQSLPPAWRNCVSATEEEVT 
PQHRLWPLLEKDSSLEARQFFYIiRAFLVEGEASVQLMLPSTCLVSLtiliTPSTLFLLDE 
DAAGSPAEPSPPAASGEASEKVPPSGPGPAVRVREQQPLSSLSSVLLYRSAPEDLRLL 
F YDEVS RLE SFWALR WCQE QLTALLAW IREPWEELFSIG LRTV I QE ALALDR 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 127B. 



Table 127B. Comparison of NOV127a against NOV127b. 


Protein Sequence 


NOV127a Residues/ 
Match Residues 


Identities/ 
Similarities for the Matched Region 


NOV127b 


1..1098 
1..1097 


891/1098 (81%) 
891/1098 (81%) 



Further analysis of the NOV127a protein yielded the following properties shown in 
Table 127C. 



Table 127C. Protein Sequence Properties NO VI 27a 


PSort i 
analysis: 


0.5163 probability located in mitochondrial matrix space; 0.3000 probability 
located in microbody (peroxisome); 0.2442 probability located in mitochondrial 
inner membrane; 0.2442 probability located in mitochondrial intermembrane 
space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV 127a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 127D. 



Table 127D, Geneseq Results for NOV127a 
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Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV127a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM39827 


Human polypeptide SEQ ID NO 
2972 - Homo sapiens, 169 aa. 
[WO200153312-A1, 26-JUL-2001] 


375..528 
14..167 


140/154 (90%) 
145/154 (93%) 


3e-78 


AAM41613 


Human polypeptide SEQ ED NO 
6544 - Homo sapiens, 184 aa. 
[WO200153312-A1, 26-JUL-2001] 


375..528 
29..182 


140/154 (90%) 
145/154 (93%) 


4e-78 


AAU19764 


Human novel extracellular matrix 
protein, Seq ED No 414 - Homo 
sapiens, 211 aa. [WO200155368-A1, 
02-AUG-2001] 


444..647 
13..209 


157/207 (75%) 
160/207(76%) 


2e-75 


ABB19833 


Protein #1 832 encoded by probe for 
measuring heart cell gene expression 
- Homo sapiens, 127 aa. 
[WO200157274-A2, 09-AUG-2001] 


409..535 
1..127 


127/127 (100%) 
127/127(100%) 


2e-70 


AAM67606 


Human bone marrow expressed 
probe encoded protein SEQ ED NO: 
27912 - Homo sapiens, 127 aa. 
[WO200157276-A2, 09-AUG-2001] 


409..535 
1..127 


127/127 (100%) 
127/127 (100%) 


2e-70 



In a BLAST search of public sequence databases, the NOV127a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 127E. 



Table 127E. Public BLASTP Results for NOV127a 


Protein 
Accession 
Number 


Protein/Organism/Length 


NOV127a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


AAL49726 


LKB1 -INTERACTING PROTEIN 
1 - Homo sapiens (Human), 1099 
aa. 


1..1098 
12..1099 


1077/1098(98%) : 
1078/1098(98%) ! 


0.0 


Q96PY9 


KIAA1898 PROTEIN - Homo 
sapiens (Human), 1013 aa 
(fragment). 


76.. 1098 ; 
1..1013 


1003/1023(98%) ! 
1003/1023(98%) i 


0.0 


Q96CN3 


SIMILAR TO RIKEN CDNA 
1200014D22 GENE - Homo 
sapiens (Human), 804 aa 
(fragment). 


288..1098 
4..804 


793/811 (97%) 1 
793/811 (97%) \ 


0.0 


Q9DBT7 








0.0 
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musculus (Mouse), 1072 aa. 


1..1072 


895/1098 (81%) 




Q9VMK9 


CG9044 PROTEIN - Drosophila 
melanogaster (Fruit fly), 1289 aa. 


12..433 i 
8..463 1 


139/459 (30%) 
220/459 (47%) 


6e-38 



PFam analysis predicts that the NOV 127a protein contains the domains shown in the 
Table 127F. 



Table 127F. Domain Analysis of NO VI 27a 


Pfam Domain 


NOV1 27a Match 
Region 


Identities/ 
Similarities 
for the Matched 

xvcgion 


Expect 
Value 


LRR: domain 1 of 5 


164.. 186 


7/25 (28%) 

UILJ \\J\J /O) 


2.5e+02 


LRR: domain 2 of 5 


187..209 


6/25 (24%) 


2.5e+02 


LRR: domain 3 of 5 


210..231 


8/25 (32%) 
13/25 (52%) 


83 


LRR: domain 4 of 5 


233..254 


9/25 (36%) 
17/25 (68%) 


16 


LRR: domain 5 of 5 


255..279 


10/27 (37%) 
19/27 (70%) 


22 


Pkinase C: domain 1 of 
1 


620..629 


5/11(45%) 
9/11 (82%) 


8.9 


rubredoxin: domain 1 of 
2 


669..686 


5/18 (28%) 
13/18(72%) 


4.6 


rubredoxin: domain 2 of 
2 


708..713 


5/6 (83%) 
6/6 (100%) 


1.2e+03 



Example B: Sequencing Methodology and Identofication of NOVX Clones 

1. GeneCalling™ Technology: This is a proprietary method of performing differential 
gene expression profiling between two or more samples developed at CuraGen and described 
by Shimkets, et al., "Gene expression analysis by transcript profiling coupled to a gene 
database query" Nature Biotechnology 17:198-803 (1999). cDNA was derived from various 
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human samples representing multiple tissue types, normal and diseased states, physiological 
states, and developmental states from different donors. Samples were obtained as whole tissue, 
primary cells or tissue cultured primary cells or cell lines. Cells and cell lines may have been 
treated with biological or chemical agents that regulate gene expression, for example, growth 
factors, chemokines or steroids. The cDNA thus derived was then digested with up to as many 
as 120 pairs of restriction enzymes and pairs of linker-adaptors specific for each pair of 
restriction enzymes were ligated to the appropriate end. The restriction digestion generates a 
mixture of unique cDNA gene fragments. Limited PCR amplification is performed with 
primers homologous to the linker adapter sequence where one primer is biotinylated and the 
other is fluorescently labeled. The doubly labeled material is isolated and the fluorescently 
labeled single strand is resolved by capillary gel electrophoresis. A computer algorithm 
compares the electropherograms from an experimental and control group for each of the 
restriction digestions. This and additional sequence-derived information is used to predict the 
identity of each differentially expressed gene fragment using a variety of genetic databases. 
The identity of the gene fragment is confirmed by additional, gene-specific competitive PCR 
or by isolation and sequencing of the gene fragment. 

2. SeqCalling™ Technology: cDNA was derived from various human samples 
representing multiple tissue types, normal and diseased states, physiological states, and 
developmental states from different donors. Samples were obtained as whole tissue, primary 
cells or tissue cultured primary cells or cell lines. Cells and cell lines may have been treated 
with biological or chemical agents that regulate gene expression, for example, growth factors, 
chemokines or steroids. The cDNA thus derived was then sequenced using CuraGen's 
proprietary SeqCalling technology. Sequence traces were evaluated manually and edited for 
corrections if appropriate. cDNA sequences from all samples were assembled together, 
sometimes including public human sequences, using bioinformatic programs to produce a 
consensus sequence for each assembly. Each assembly is included in CuraGen Corporation's 
database. Sequences were included as components for assembly when the extent of identity 
with another component was at least 95% over 50 bp. Each assembly represents a gene or 
portion thereof and includes information on variants, such as splice forms single nucleotide 
polymorphisms (SNPs), insertions, deletions and other sequence variations. 



3. PathCalling IM Technology: 
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The NOVX nucleic acid sequences are derived by laboratory screening of cDNA 
library by the two-hybrid approach. cDNA fragments covering either the full length of the 
DNA sequence, or part of the sequence, or both, are sequenced. In silico prediction was based 
on sequences available in CuraGen Corporation's proprietary sequence databases or in the 
public human sequence databases, and provided either the full length DNA sequence, or some 
portion thereof. 

The laboratory screening was performed using the methods summarized below: 

cDNA libraries were derived from various human samples representing multiple tissue 
types, normal and diseased states, physiological states, and developmental states from 
different donors. Samples were obtained as whole tissue, primary cells or tissue cultured 
primary cells or cell lines. Cells and cell lines may have been treated with biological or 
chemical agents that regulate gene expression, for example, growth factors, chemokines or 
steroids. The cDNA thus derived was then directionally cloned into the appropriate two-hybrid 
. vector (Gal4-activation domain (Gal4-AD) fusion). Such cDNA libraries as well as 
commercially available cDNA libraries from Clontech (Palo Alto, CA) were then transferred 
from E.coli into a CuraGen Corporation proprietary yeast strain (disclosed in U. S. Patents 
6,057,101 and 6,083,693, incorporated herein by reference in their entireties). 

Gal4-binding domain (Gal4-BD) fusions of a CuraGen Corportion proprietary library 
of human sequences was used to screen multiple Gal4-AD fusion cDNA libraries resulting in 
the selection of yeast hybrid diploids in each of which the Gal4-AD fusion contains an 
individual cDNA. Each sample was amplified using the polymerase chain reaction (PCR) 
using non-specific primers at the cDNA insert boundaries. Such PCR product was sequenced; 
sequence traces were evaluated manually and edited for corrections if appropriate. cDNA 
sequences from all samples were assembled together, sometimes including public human 
sequences, using bioinformatic programs to produce a consensus sequence for each assembly. 
Each assembly is included in CuraGen Corporation's database. Sequences were included as 
components for assembly when the extent of identity with another component was at least 
95% over 50 bp. Each assembly represents a gene or portion thereof and includes information 
on variants, such as splice forms single nucleotide polymorphisms (SNPs), insertions, 
deletions and other sequence variations. 
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Physical clone: the cDNA fragment derived by the screening procedure, covering the 
entire open reading frame is, as a recombinant DNA, cloned into pACT2 plasmid (Clontech) 
used to make the cDNA library. The recombinant plasmid is inserted into the host and selected 
by the yeast hybrid diploid generated during the screening procedure by the mating of both 
CuraGen Corporation proprietary yeast strains N106 r and YULH (U. S, Patents 6,057,101 and 
6,083,693). 

4. RACE: Techniques based on the polymerase chain reaction such as rapid 
amplification of cDNA ends (RACE), were used to isolate or complete the predicted sequence 
of the cDNA of the invention. Usually multiple clones were sequenced from one or more 
human samples to derive the sequences for fragments. Various human tissue samples from 
different donors were used for the RACE reaction. The sequences derived from these 
procedures were included in the SeqCalling Assembly process described in preceding 
paragraphs. 

5. Exon Linking: The NOVX target sequences identified in the present invention were 
subjected to the exon linking process to confirm the sequence. PCR primers were designed by 
starting at the most upstream sequence available, for the forward primer, and at the most 
downstream sequence available for the reverse primer. Table Bl shows the sequences of the 
PCR primers used for obtaining different clones. In each case, the sequence was examined, 
walking inward from the respective termini toward the coding sequence, until a suitable 
sequence that is either unique or highly selective was encountered, or, in the case of the 
reverse primer, until the stop codon was reached. Such primers were designed based on in 
silico predictions for the full length cDNA, part (one or more exons) of the DNA or protein 
sequence of the target sequence, or by translated homology of the predicted exons to closely 
related human sequences from other species. These primers were then employed in PCR 
amplification based on the following pool of human cDNAs: adrenal gland, bone marrow, 
brain - amygdala, brain - cerebellum, brain - hippocampus, brain - substantia nigra, brain - 
thalamus, brain -whole, fetal brain, fetal kidney, fetal liver, fetal lung, heart, kidney, 
lymphoma - Raji, mammary gland, pancreas, pituitary gland, placenta, prostate, salivary 
gland, skeletal muscle, small intestine, spinal cord, spleen, stomach, testis, thyroid, trachea, 
uterus. Usually the resulting amplicons were gel purified, cloned and sequenced to high 
redundancy. The PCR product derived from exon linking was cloned into the pCR2.1 vector 
from Invitrogen. The resulting bacterial clone has an insert covering the entire open reading 
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frame cloned into the pCR2.1 vector. The resulting sequences from all clones were assembled 
with themselves, with other fragments in CuraGen Corporation's database and with public 
ESTs. Fragments and ESTs were included as components for an assembly when the extent of 
their identity with another component of the assembly was at least 95% over 50 bp. In 
addition, sequence traces were evaluated manually and edited for corrections if appropriate. 
These procedures provide the sequence reported herein. 

6, Physical Clone: 

Exons were predicted by homology and the intron/exon boundaries were determined 
using standard genetic rules. Exons were further selected and refined by means of similarity 
determination using multiple BLAST (for example, tBlastN, BlastX, and BlastN) searches, 
and, in some instances, GeneScan and Grail. Expressed sequences from both public and 
proprietary databases were also added when available to further define and complete the gene 
sequence. The DNA sequence was then manually corrected for apparent inconsistencies 
thereby obtaining the sequences encoding the full-length protein. 

The PCR product derived by exon linking, covering the entire open reading frame, was 
cloned into the pCR2.1 vector from Invitrogen to provide clones used for expression and 
screening purposes. 

Example C: Quantitative expression analysis of clones in various cells and tissues 

The quantitative expression of various clones was assessed using microtiter plates 
containing RNA samples from a variety of normal and pathology-derived cells, cell lines and 
tissues using real time quantitative PCR (RTQ PCR). RTQ PCR was performed on an Applied 
Biosystems ABI PRISM® 7700 or an ABI PRISM® 7900 HT Sequence Detection System. 
Various collections of samples are assembled on the plates, and referred to as Panel 1 
(containing normal tissues and cancer cell lines), Panel 2 (containing samples derived from 
tissues from normal and cancer sources), Panel 3 (containing cancer cell lines), Panel 4 
(containing cells and cell lines from normal tissues and cells related to inflammatory 
conditions), Panel 5D/5I (containing human tissues and cell lines with an emphasis on 
metabolic diseases), AI_comprehensive__panel (containing normal tissue and samples from 
autoimmune diseases), Panel CNSD.01 (containing central nervous system samples from 
normal and diseased brains) and CNS_neurodegeneration_panel (containing samples from 
normal and Alzheimer's diseased brains). 
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RNA integrity from all samples is controlled for quality by visual assessment of 
agarose gel electropherograms using 28 S and 18S ribosomal RNA staining intensity ratio as a 
guide (2: 1 to 2.5:1 28s: 1 8s) and the absence of low molecular weight RNAs that would be 
indicative of degradation products. Samples are controlled against genomic DNA 
contamination by RTQ PCR reactions run in the absence of reverse transcriptase using probe 
and primer sets designed to amplify across the span of a single exon. 

First, the RNA samples were normalized to reference nucleic acids such as 
constitutively expressed genes (for example, p-actin and GAPDH). Normalized RNA (5 ul) 
was converted to cDNA and analyzed by RTQ-PCR using One Step RT-PCR Master Mix 
Reagents (Applied Biosystems; Catalog No* 4309169) and gene-specific primers according to 
the manufacturer's instructions. 

In other cases, non-normalized RNA samples were converted to single strand cDNA 
(sscDNA) using Superscript II (Invitrogen Corporation; Catalog No. 18064-147) and random 
hexamers according to the manufacturer's instructions. Reactions containing up to 10 fig of 
total RNA were performed in a volume of 20 ul and incubated for 60 minutes at 42°C. This 
reaction can be scaled up to 50 |ig of total RNA in a final volume of 100 |il. sscDNA samples 
are then normalized to reference nucleic acids as described previously, using IX TaqMan® 
Universal Master mix (Applied Biosystems; catalog No. 4324020), following the 
manufacturer's instructions. 

Probes and primers were designed for each assay according to Applied Biosystems 
Primer Express Software package (version I for Apple Computer's Macintosh Power PC) or a 
similar algorithm using the target sequence as input. Default settings were used for reaction 
conditions and the following parameters were set before selecting primers: primer 
concentration = 250 nM, primer melting temperature (Tm) range = 58°-60°C, primer optimal 
Tm = 59°C, maximum primer difference = 2°C, probe does not have 5'G, probe Tm must be 
10°C greater than primer Tm, amplicon size 75bp to lOObp. The probes and primers selected 
(see below) were synthesized by Synthegen (Houston, TX, USA). Probes were double purified 
by HPLC to remove uncoupled dye and evaluated by mass spectroscopy to verify coupling of 
reporter and quencher dyes to the 5* and 3' ends of the probe, respectively. Their final 
concentrations were; forward and reverse primers, 900nM each, and probe, 200nM. 
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PCR conditions: When working with RNA samples, normalized RNA from each tissue 
and each cell line was spotted in each well of either a 96 well or a 384-well PCR plate 
(Applied Biosystems). PCR cocktails included either a single gene specific probe and primers 
set, or two multiplexed probe and primers sets (a set specific for the target clone and another 
gene-specific set multiplexed with the target probe). PCR reactions were set up using 
TaqMan® One-Step RT-PCR Master Mix (Applied Biosystems, Catalog No. 4313803) 
following manufacturer's instructions. Reverse transcription was performed at 48°C for 30 
minutes followed by amplification/PCR cycles as follows: 95°C 10 min, then 40 cycles of 
95°C for 15 seconds, 60°C for 1 minute. Results were recorded as CT values (cycle at which a 
given sample crosses a threshold level of fluorescence) using a log scale, with the difference in 
RNA concentration between a given sample and the sample with the lowest CT value being 
represented as 2 to the power of delta CT. The percent relative expression is then obtained by 
taking the reciprocal of this RNA difference and multiplying by 100. 

When working with sscDNA samples, normalized sscDNA was used as described 
previously for RNA samples. PCR reactions containing one or two sets of probe and primers 
were set up as described previously, using IX TaqMan® Universal Master mix (Applied 
Biosystems; catalog No. 4324020), following the manufacturer's instructions. PCR 
amplification was performed as follows: 95°C 10 min, then 40 cycles of 95°C for 15 seconds, 
60°C for 1 minute. Results were analyzed and processed as described previously. 



Panels 1, 1.1, 1.2, and 1.3D 

The plates for Panels 1, 1.1, 1.2 and 1 .3D include 2 control wells (genomic DNA 
control and chemistry control) and 94 wells containing cDNA from various samples. The 
samples in these panels are broken into 2 classes: samples derived from cultured cell lines and 
samples derived from primary normal tissues. The cell lines are derived from cancers of the 
following types: lung cancer, breast cancer, melanoma, colon cancer, prostate cancer, CNS 
cancer, squamous cell carcinoma, ovarian cancer, liver cancer, renal cancer, gastric cancer and 
pancreatic cancer. Cell lines used in these panels are widely available through the American 
Type Culture Collection (ATCC), a repository for cultured cell lines, and were cultured using 
the conditions recommended by the ATCC. The normal tissues found on these panels are 
comprised of samples derived from all major organ systems from single adult individuals or 
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fetuses. These samples are derived from the following organs: adult skeletal muscle, fetal 
skeletal muscle, adult heart, fetal heart, adult kidney, fetal kidney, adult liver, fetal liver, adult 
lung, fetal lung, various regions of the brain, the spleen, bone marrow, lymph node, pancreas, 
salivary gland, pituitary gland, adrenal gland, spinal cord, thymus, stomach, small intestine, 
colon, bladder, trachea, breast, ovary, uterus, placenta, prostate, testis and adipose. 

In the results for Panels 1, 1.1, 1.2 and 1.3D, the following abbreviations are used: 

ca. = carcinoma, 

* - established from metastasis, 

met = metastasis, 

s cell var = small cell variant, 

non-s = non-sm = non-small, 

squam = squamous, 

pi. eff = pi effusion = pleural effusion, 

glio = glioma, 

astro = astrocytoma, and 

neuro = neuroblastoma. 

General_screeningj>anel_vl A 

The plates for Panel 1 .4 include 2 control wells (genomic DNA control and chemistry 
control) and 94 wells containing cDNA from various samples. The samples in Panel 1 .4 are 
broken into 2 classes: samples derived from cultured cell lines and samples derived from 
primary normal tissues. The cell lines are derived from cancers of the following types: lung 
cancer, breast cancer, melanoma, colon cancer, prostate cancer, CNS cancer, squamous cell 
carcinoma, ovarian cancer, liver cancer, renal cancer, gastric cancer and pancreatic cancer. 
Cell lines used in Panel 1 .4 are widely available through the American Type Culture 
Collection (ATCC), a repository for cultured cell lines, and were cultured using the conditions 
recommended by the ATCC. The normal tissues found on Panel 1 .4 are comprised of pools of 
samples derived from all major organ systems from 2 to 5 different adult individuals or 
fetuses. These samples are derived from the following organs: adult skeletal muscle, fetal 
skeletal muscle, adult heart, fetal heart, adult kidney, fetal kidney, adult liver, fetal liver, adult 
lung, fetal lung, various regions of the brain, the spleen, bone marrow, lymph node, pancreas, 
salivary gland, pituitary gland, adrenal gland, spinal cord, thymus, stomach, small intestine, 
colon, bladder, trachea, breast, ovary, uterus, placenta, prostate, testis and adipose. 
Abbreviations are as described for Panels 1, 1.1, 1.2, and 1.3D. 

Panels 2D and 2.2 
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The plates for Panels 2D and 2.2 generally include 2 control wells and 94 test samples 
composed of RNA or cDNA isolated from human tissue procured by surgeons working in 
close cooperation with the National Cancer Institute's Cooperative Human Tissue Network 
(CHTN) or the National Disease Research Initiative (NDRI). The tissues are derived from 
human malignancies and in cases where indicated many malignant tissues have "matched 
margins" obtained from noncancerous tissue just adjacent to the tumor. These are termed 
normal adjacent tissues and are denoted "NAT" in the results below. The tumor tissue and the 
"matched margins" are evaluated by two independent pathologists (the surgical pathologists 
and again by a pathologist at NDRI or CHTN). This analysis provides a gross 
histopathological assessment of tumor differentiation grade. Moreover, most samples include 
the original surgical pathology report that provides information regarding the clinical stage of 
the patient. These matched margins are taken from the tissue surrounding (i.e. immediately 
proximal) to the zone of surgery (designated "NAT", for normal adjacent tissue, in Table RR). 
In addition, RNA and cDNA samples were obtained from various human tissues derived from 
autopsies performed on elderly people or sudden death victims (accidents, etc.). These tissues 
were ascertained to be free of disease and were purchased from various commercial sources 
such as Clontech (Palo Alto, CA), Research Genetics, and Invitrogen. 



Panel 3D 

The plates of Panel 3D are comprised of 94 cDNA samples and two control samples. 
Specifically, 92 of these samples are derived from cultured human cancer cell lines, 2 samples 
of human primary cerebellar tissue and 2 controls. The human cell lines are generally obtained 
from ATCC (American Type Culture Collection), NCI or the German tumor cell bank and fall 
into the following tissue groups: Squamous cell carcinoma of the tongue, breast cancer, 
prostate cancer, melanoma, epidermoid carcinoma, sarcomas, bladder carcinomas, pancreatic 
cancers, kidney cancers, leukemias/lymphomas, ovarian/uterine/cervical, gastric, colon, lung 
and CNS cancer cell lines. In addition, there are two independent samples of cerebellum. 
These cells are all cultured under standard recommended conditions and RNA extracted using 
the standard procedures. The cell lines in panel 3D and 1.3D are of the most common cell lines 
used in the scientific literature. 

Panels 4D, 4R, and 4. ID 
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Panel 4 includes samples on a 96 well plate (2 control wells, 94 test samples) 
composed of RNA (Panel 4R) or cDNA (Panels 4D/4.1D) isolated from various human cell 
lines or tissues related to inflammatory conditions. Total RNA from control normal tissues 
such as colon and lung (Stratagene, La Jolla, CA) and thymus and kidney (Clontech) was 
employed. Total RNA from liver tissue from cirrhosis patients and kidney from lupus patients 
was obtained from BioChain (Biochain Institute, Inc., Hayward, CA). Intestinal tissue for 
RNA preparation from patients diagnosed as having Crohn's disease and ulcerative; colitis was 
obtained from the National Disease Research Interchange (NDRI) (Philadelphia, PA). 

Astrocytes, lung fibroblasts, dermal fibroblasts, coronary artery smooth muscle cells, 
small airway epithelium, bronchial epithelium, microvascular dermal endothelial cells, 
microvascular lung endothelial cells, human pulmonary aortic endothelial cells, human 
umbilical vein endothelial cells were all purchased from Clonetics (Walkersville, MD) and 
grown in the media supplied for these cell types by Clonetics. These primary cell types were 
activated with various cytokines or combinations of cytokines for 6 and/or 12-14 hours, as 
indicated. The following cytokines were used; IL-1 beta at approximately l-5ng/ml, TNF 
alpha at approximately 5-10ng/ml, IFN gamma at approximately 20-50ng/ml, IL-4 at 
approximately 5-10ng/ml, IL-9 at approximately 5-10ng/ml, IL-13 at approximately 5- 
lOng/ml. Endothelial cells were sometimes starved for various times by culture in the basal 
media from Clonetics with 0.1% serum. 

Mononuclear cells were prepared from blood of employees at CuraGen Corporation, 

using Ficoll. LAK cells were prepared from these cells by culture in DMEM 5% FCS 

(Hyclone), lOO^M non essential amino acids (Gibco/Life Technologies, Rockville, MD), 

ImM sodium pyruvate (Gibco), mercaptoethanol 5.5xlO" 5 M (Gibco), and lOmM Hepes 

(Gibco) and Interleukin 2 for 4-6 days. Cells were then either activated with 10-20ng/ml PMA 

and l-2jig/ml ionomycin, IL-12 at 5-10ng/ml, IFN gamma at 20-50ng/ml and IL-1 8 at 5- 

lOng/ml for 6 hours. In some cases, mononuclear cells were cultured for 4-5 days in DMEM 

5% FCS (Hyclone), 100|aM non essential amino acids (Gibco), ImM sodium pyruvate 

(Gibco), mercaptoethanol 5.5xl0' 5 M (Gibco), and lOmM Hepes (Gibco) with PHA 

(phytohemagglutinin) or PWM (pokeweed mitogen) at approximately S^g/ml. Samples were 

taken at 24, 48 and 72 hours for RNA preparation. MLR (mixed lymphocyte reaction) samples 

were obtained by taking blood from two donors, isolating the mononuclear cells using Ficoll 

and mixing the isolated mononuclear cells 1:1 at a final concentration of approximately 

2xl0 6 cells/ml in DMEM 5% FCS (Hyclone), lOO^M non essential amino acids (Gibco), ImM 
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sodium pyruvate (Gibco), mercaptoethanol (5.5x1 0~ 5 M) (Gibco), and lOmM Hepes (Gibco). 
The MLR was cultured and samples taken at various time points ranging from 1- 7 days for 
RNA preparation. 

Monocytes were isolated from mononuclear cells using CD 14 Miltenyi Beads, +ve VS 
selection columns and a Vario Magnet according to the manufacturers instructions. 
Monocytes were differentiated into dendritic cells by culture in DMEM 5% fetal calf serum 
(FCS) (Hyclone, Logan, UT), IOOjiM non essential amino acids (Gibco), ImM sodium 
pyruvate (Gibco), mercaptoethanol 5.5x1 0" 5 M (Gibco), and lOmM Hepes (Gibco), 50ng/ml 
GMCSF and 5ng/ml IL-4 for 5-7 days. Macrophages were prepared by culture of monocytes 
for 5-7 days in DMEM 5% FCS (Hyclone), 100nM non essential amino acids (Gibco), ImM 
sodium pyruvate (Gibco), mercaptoethanol 5.5xl0* 5 M (Gibco), lOmM Hepes (Gibco) and 
10% AB Human Serum or MCSF at approximately 50ng/ml. Monocytes, macrophages and 
dendritic cells were stimulated for 6 and 12-14 hours with lipopolysaccharide (LPS) at 
lOOng/ml. Dendritic cells were also stimulated with anti-CD40 monoclonal antibody 
(Pharmingen) at 10|ig/ml for 6 and 12-14 hours. 

CD4 lymphocytes, CD8 lymphocytes and NK cells were also isolated from 

mononuclear cells using CD4, CD8 and CD56 Miltenyi beads, positive VS selection columns 

and a Vario Magnet according to the manufacturer's instructions. CD45RA and CD45RO CD4 

lymphocytes were isolated by depleting mononuclear cells .of CD8, CD56, CD14 and CD19 

cells using CD8, CD56, CD14 and CD19 Miltenyi beads and positive selection. CD45RO 

beads were then used to isolate the CD45RO CD4 lymphocytes with the remaining cells being 

CD45RA CD4 lymphocytes. CD45RA CD4, CD45RO CD4 and CD8 lymphocytes were 

placed in DMEM 5% FCS (Hyclone), lOO^iM non essential amino acids (Gibco), ImM 

sodium pyruvate (Gibco), mercaptoethanol 5.5xl0* 5 M (Gibco), and lOmM Hepes (Gibco) and 

plated at 10 6 cells/ml onto Falcon 6 well tissue culture plates that had been coated overnight 

with 0.5ng/ml anti-CD28 (Pharmingen) and 3ug/ml anti-CD3 (OKT3, ATCC) in PBS. After 6 

and 24 hours, the cells were harvested for RNA preparation. To prepare chronically activated 

CD8 lymphocytes, we activated the isolated CD8 lymphocytes for 4 days on anti-CD28 and 

anti-CD3 coated plates and then harvested the cells and expanded them in DMEM 5% FCS 

(Hyclone), 100|iM non essential amino acids (Gibco), ImM sodium pyruvate (Gibco), 

mercaptoethanol 5.5xl0" 5 M (Gibco), and lOmM Hepes (Gibco) and IL-2. The expanded CD8 

cells were then activated again with plate bound anti-CD3 and anti-CD28 for 4 days and 

expanded as before. RNA was isolated 6 and 24 hours after the second activation and after 4 
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days of the second expansion culture. The isolated NK cells were cultured in DMEM 5% FCS 
(Hyclone), lOO^M non essential amino acids (Gibco), ImM sodium pyruvate (Gibco), 
mercaptoethanol 5.5x1 0" 5 M (Gibco), and lOmM Hepes (Gibco) and IL-2 for 4-6 days before 
RNA was prepared. 

To obtain B cells, tonsils were procured from NDRL The tonsil was cut up with sterile 
dissecting scissors and then passed through a sieve. Tonsil cells were then spun down and 
resupended at 10 6 cells/ml in DMEM 5% FCS (Hyclone), 100|iM non essential amino acids 
(Gibco), ImM sodium pyruvate (Gibco), mercaptoethanol 5.5xl0' 5 M (Gibco), and lOmM 
Hepes (Gibco). To activate the cells, we used PWM at 5ng/ml or anti-CD40 (Pharmingen) at 
approximately lOjig/ml and IL-4 at 5-10ng/ml. Cells were harvested for RNA preparation at 
24,48 and 72 hours. 

To prepare the primary and secondary Thl/Th2 and Trl cells, six-well Falcon plates 
were coated overnight with 10ng/ml anti-CD28 (Pharmingen) and 2^ig/ml OKT3 (ATCC), and 
then washed twice with PBS. Umbilical cord blood CD4 lymphocytes (Poietic Systems, 
German Town, MD) were cultured at 10 5 -10 6 cells/ml in DMEM 5% FCS (Hyclone), 100|iM 
non essential amino acids (Gibco), ImM sodium pyruvate (Gibco), mercaptoethanol 5.5x10' 
5 M (Gibco), lOmM Hepes (Gibco) and IL-2 (4ng/ml). IL-12 (5ng/ml) and anti-IL4 (^g/ml) 
were used to direct to Thl, while IL-4 (5ng/ml) and anti-IFN gamma (l^xg/ml) were used to 
direct to Th2 and IL-10 at 5ng/ml was used to direct to Trl. After 4-5 days, the activated Thl, 
Th2 and Trl lymphocytes were washed once in DMEM and expanded for 4-7 days in DMEM 
5% FCS (Hyclone), IOOjiM non essential amino acids (Gibco), ImM sodium pyruvate 
(Gibco), mercaptoethanol 5.5xl0" 5 M (Gibco), lOmM Hepes (Gibco) and IL-2 (lng/ml). 
Following this, the activated Thl, Th2 and Trl lymphocytes were re-stimulated for 5 days 
with anti-CD28/OKT3 and cytokines as described above, but with the addition of anti-CD95L 
(ljig/ml) to prevent apoptosis. After 4-5 days, the Thl, Th2 and Trl lymphocytes were 
washed and then expanded again with IL-2 for 4-7 days. Activated Thl and Th2 lymphocytes 
were maintained in this way for a maximum of three cycles. RNA was prepared from primary 
and secondary Thl, Th2 and Trl after 6 and 24 hours following the second and third 
activations with plate bound anti-CD3 and anti-CD28 mAbs and 4 days into the second and 
third expansion cultures in Interleukin 2. 

The following leukocyte cells lines were obtained from the ATCC: Ramos, EOL-1, 
KU-812. EOL cells were further differentiated by culture in O.lmM dbcAMP at 5xl0 5 cells/ml 
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for 8 days, changing the media every 3 days and adjusting the cell concentration to 
5xl0 5 cells/ml. For the culture of these cells, we used DMEM or RPMI (as recommended by 
the ATCC), with the addition of 5% FCS (Hyclone), lOO^M non essential amino acids 
(Gibco), ImM sodium pyruvate (Gibco), mercaptoethanol 5.5x1 0' 5 M (Gibco), lOmM Hepes 
(Gibco). RNA was either prepared from resting cells or cells activated with PMA at lOng/ml 
and ionomycin at l^g/ml for 6 and 14 hours. Keratinocyte line CCD 106 and an airway 
epithelial tumor line NCI-H292 were also obtained from the ATCC. Both were cultured in 
DMEM 5% FCS (Hyclone), 100|iM non essential amino acids (Gibco), ImM sodium pyruvate 
(Gibco), mercaptoethanol 5.5xl0* 5 M (Gibco), and lOmM Hepes (Gibco). CCD1 106 cells were 
activated for 6 and 14 hours with approximately 5 ng/ml TNF alpha and Ing/ml IL-1 beta, 
while NCI-H292 cells were activated for 6 and 14 hours with the following cytokines: 5ng/ml 
IL-4, 5ng/ml IL-9, 5ng/ml IL-1 3 and 25ng/ml IFN gamma. 

For these cell lines and blood cells, RNA was prepared by lysing approximately 
10 7 cells/ml using Trizol (Gibco BRL). Briefly, 1/10 volume of bromochloropropane 
(Molecular Research Corporation) was added to the RNA sample, vortexed and after 10 
minutes at room temperature, the tubes were spun at 14,000 rpm in a Sorvall SS34 rotor. The 
aqueous phase was removed and placed in a 15ml Falcon Tube. An equal volume of 
isopropanol was added and left at -20°C overnight. The precipitated RNA was spun down at 
9,000 rpm for 15 min in a Sorvall SS34 rotor and washed in 70% ethanol. The pellet was 
redissolved in 300^il of RNAse-free water and 35^1 buffer (Promega) 5^1 DTT, 7|il RNAsin 
and 8(il DNAse were added. The tube was incubated at 37°C for 30 minutes to remove 
contaminating genomic DNA, extracted once with phenol chloroform and re-precipitated with 
1/10 volume of 3M sodium acetate and 2 volumes of 100% ethanol. The RNA was spun down 
and placed in RNAse free water. RNA was stored at -80°C. 

AI_comprehensive panel_vl.O 

The plates for AI_comprehensive panel_vl.O include two control wells and 89 test 
samples comprised of cDNA isolated from surgical and postmortem human tissues obtained 
from the Backus Hospital and Clinomics (Frederick, MD). Total RNA was extracted from 
tissue samples from the Backus Hospital in the Facility at CuraGen. Total RNA from other 
tissues was obtained from Clinomics. 
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Joint tissues including synovial fluid, synovium, bone and cartilage were obtained from 
patients undergoing total knee or hip replacement surgery at the Backus Hospital. Tissue 
samples were immediately snap frozen in liquid nitrogen to ensure that isolated RNA was of 
optimal quality and not degraded. Additional samples of osteoarthritis and rheumatoid arthritis 
joint tissues were obtained from Clinomics. Normal control tissues were supplied by 
Clinomics and were obtained during autopsy of trauma victims. 

Surgical specimens of psoriatic tissues and adjacent matched tissues were provided as 
total RNA by Clinomics. Two male and two female patients were selected between the ages of 
25 and 47. None of the patients were taking prescription drugs at the time samples were 
isolated. 

Surgical specimens of diseased colon from patients with ulcerative colitis and Crohns 
disease and adjacent matched tissues were obtained from Clinomics. Bowel tissue from three 
female and three male Crohn's patients between the ages of 41-69 were used. Two patients 
were not on prescription medication while the others were taking dexamethasone, 
phenobarbital, or tylenol. Ulcerative colitis tissue was from three male and four female 
patients. Four of the patients were taking lebvid and two were on phenobarbital. 

Total RNA from post mortem lung tissue from trauma victims with no disease or with 
emphysema, asthma or COPD was purchased from Clinomics. Emphysema patients ranged in 
age from 40-70 and all were smokers, this age range was chosen to focus on patients with 
cigarette-linked emphysema and to avoid those patients with alpha-lanti-trypsin deficiencies. 
Asthma patients ranged in age from 36-75, and excluded smokers to prevent those patients that 
could also have COPD. COPD patients ranged in age from 35-80 and included both smokers 
and non-smokers. Most patients were taking corticosteroids, and bronchodilators. 

In the labels employed to identify tissues in the AI_comprehensive panel_vl.O panel, 
the following abbreviations are used: 

AI = Autoimmunity 
Syn - Synovial 

Normal = No apparent disease 
Rep22 /Rep20 = individual patients 
RA = Rheumatoid arthritis 
Backus = From Backus Hospital 
OA = Osteoarthritis 
(SS) (BA) (MF) = Individual patients 
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Adj = Adjacent tissue 
Match control - adjacent tissues 
-M = Male 
-F = Female 

COPD = Chronic obstructive pulmonary disease 
Panels 5D and 51 

The plates for Panel 5D and 51 include two control wells and a variety of cDNAs 
isolated from human tissues and cell lines with an emphasis on metabolic diseases. Metabolic 
tissues were obtained from patients enrolled in the Gestational Diabetes study. Cells were 
obtained during different stages in the differentiation of adipocytes from human mesenchymal 
stem cells. Human pancreatic islets were also obtained. 

In the Gestational Diabetes study subjects are young (18-40 years), otherwise healthy 
women with and without gestational diabetes undergoing routine (elective) Caesarean section. 
After delivery of the infant, when the surgical incisions were being repaired/closed, the 
obstetrician removed a small sample (<1 cc) of the exposed metabolic tissues during the 
closure of each surgical level. The biopsy material was rinsed in sterile saline, blotted and fast 
frozen within 5 minutes from the time of removal. The tissue was then flash frozen in liquid 
nitrogen and stored, individually, in sterile screw-top tubes and kept on dry ice for shipment to 
or to be picked up by CuraGen. The metabolic tissues of interest include uterine wall (smooth 
muscle), visceral adipose, skeletal muscle (rectus) and subcutaneous adipose. Patient 
descriptions are as follows: 

Patient 2 Diabetic Hispanic, overweight, not on insulin 

Patient 7-9 Nondiabetic Caucasian and obese (BMI>30) 

Patient 10 Diabetic Hispanic, overweight, on insulin 

Patient 1 1 Nondiabetic African American and overweight 

Patient 12 Diabetic Hispanic on insulin 

Adipocyte differentiation was induced in donor progenitor cells obtained from Osirus 
(a division of Clonetics/BioWhittaker) in triplicate, except for Donor 3U which had only two 
replicates. Scientists at Clonetics isolated, grew and differentiated human mesenchymal stem 
cells (HuMSCs) for CuraGen based on the published protocol found in Mark F. Pittenger, et 
al., Multilineage Potential of Adult Human Mesenchymal Stem Cells Science Apr 2 1999: 
143-147. Clonetics provided Trizol lysates or frozen pellets suitable for mRNA isolation and 
ds cDNA production. A general description of each donor is as follows: 
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Donor 2 and 3 U: Mesenchymal Stem cells, Undifferentiated Adipose 
Donor 2 and 3 AM: Adipose, AdiposeMidway Differentiated 
Donor 2 and 3 AD: Adipose, Adipose Differentiated 

Human cell lines were generally obtained from ATCC (American Type Culture 
Collection), NCI or the German tumor cell bank and fall into the following tissue groups: 
kidney proximal convoluted tubule, uterine smooth muscle cells, small intestine, liver HepG2 
cancer cells, heart primary stromal cells, and adrenal cortical adenoma cells. These cells are all 
cultured under standard recommended conditions and RNA extracted using the standard 
procedures. All samples were processed at CuraGen to produce single stranded cDNA. 

Panel 51 contains all samples previously described with the addition of pancreatic islets 
from a 58 year old female patient obtained from the Diabetes Research Institute at the 
University of Miami School of Medicine. Islet tissue was processed to total RNA at an outside 
source and delivered to CuraGen for addition to panel 51. 

In the labels employed to identify tissues in the 5D and 51 panels, the following 
abbreviations are used: 

GO Adipose = Greater Omentum Adipose 
SK = Skeletal Muscle 
UT = Uterus 
PL = Placenta 

AD = Adipose Differentiated 

AM = Adipose Midway Differentiated 

U = Undifferentiated Stem Cells 

Panel CNSD.01 

The plates for Panel CNSD.01 include two control wells and 94 test samples 
comprised of cDNA isolated from postmortem human brain tissue obtained from the Harvard 
Brain Tissue Resource Center. Brains are removed from calvaria of donors between 4 and 24 
hours after death, sectioned by neuroanatomists, and frozen at -80°C in liquid nitrogen vapor. 
All brains are sectioned and examined by neuropathologists to confirm diagnoses with clear 
associated neuropathology. 

Disease diagnoses are taken from patient records. The panel contains two brains from 

each of the following diagnoses: Alzheimer's disease, Parkinson's disease, Huntington's 

disease, Progressive Supernuclear Palsy, Depression, and "Normal controls". Within each of 

these brains, the following regions are represented: cingulate gyrus, temporal pole, globus 
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palladus, substantia nigra, Brodman Area 4 (primary motor strip), Brodman Area 7 (parietal 
cortex), Brodman Area 9 (prefrontal cortex), and Brodman area 17 (occipital cortex). Not all 
brain regions are represented in all cases; e.g., Huntington's disease is characterized in part by 
neurodegeneration in the globus palladus, thus this region is impossible to obtain from 
confirmed Huntington's cases. Likewise Parkinson's disease is characterized by degeneration 
of the substantia nigra making this region more difficult to obtain. Normal control brains were 
examined for neuropathology and found to be free of any pathology consistent with 
neurodegeneration. 

In the labels employed to identify tissues in the CNS panel, the following abbreviations 
are used: 

PSP = Progressive supranuclear palsy 
Sub Nigra = Substantia nigra 
Glob Palladus= Globus palladus 
Temp Pole = Temporal pole 
Cing Gyr = Cingulate gyrus 
BA 4 = Brodman Area 4 

Panel CNS_Neurodegeneration_V1.0 

The plates for Panel CNS_Neurodegeneration_V1.0 include two control wells and 47 
test samples comprised of cDNA isolated from postmortem human brain tissue obtained from 
the Harvard Brain Tissue Resource Center (McLean Hospital) and the Human Brain and 
Spinal Fluid Resource Center (VA Greater Los Angeles Healthcare System). Brains are 
removed from calvaria of donors between 4 and 24 hours after death, sectioned by 
neuroanatomists, and frozen at -80°C in liquid nitrogen vapor. All brains are sectioned and 
examined by neuropathologists to confirm diagnoses with clear associated neuropathology. 

Disease diagnoses are taken from patient records. The panel contains six brains from 

Alzheimer's disease (AD) patients, and eight brains from "Normal controls" who showed no 

evidence of dementia prior to death. The eight normal control brains are divided into two 

categories: Controls with no dementia and no Alzheimer's like pathology (Controls) and 

controls with no dementia but evidence of severe Alzheimer's like pathology, (specifically 

senile plaque load rated as level 3 on a scale of 0-3; 0 = no evidence of plaques, 3 = severe AD 

senile plaque load). Within each of these brains, the following regions are represented: 

hippocampus, temporal cortex (Brodman Area 21), parietal cortex (Brodman area 7), and 

occipital cortex (Brodman area 17). These regions were chosen to encompass all levels of 
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neurodegeneration in AD. The hippocampus is a region of early and severe neuronal loss in 
AD; the temporal cortex is known to show neurodegeneration in AD after the hippocampus; 
the parietal cortex shows moderate neuronal death in the late stages of the disease; the 
occipital cortex is spared in AD and therefore acts as a "control" region within AD patients. 
Not all brain regions are represented in all cases. 

In the labels employed to identify tissues in the CNS_Neurodegeneration_V1.0 panel, 
the following abbreviations are used: 

AD = Alzheimer's disease brain; patient was demented and showed AD-like pathology 
upon autopsy 

Control = Control brains; patient not demented, showing no neuropathology 
Control (Path) = Control brains; pateint not demented but showing sever AD-like 
pathology 

SupTemporal Ctx = Superior Temporal Cortex 
Inf Temporal Ctx = Inferior Temporal Cortex 

A. CG58522-01: HUMAN PLATELET-ACTIVATING FACTOR 
ACETYLHYDROLASE IB BETA 

Expression of gene CG58522-01 was assessed using the primer-probe set Ag3365, 
described in Table AA. Results of the RTQ-PCR runs are shown in Table AB. 



Table AA . Probe Name Ag3365 



Primers! 


Sequences 


Lengthj 


Start 
Position 


SEQ ID 
NO: 


Forward 


5 ' -cagaatgaaccaaggagactca-3 ' 


22 | 


3 


357 


Probe 


TET-5 ' -ctactccgcatgcggcagaagacatt-3 ' - 
TAMRA 


26 j 


35 ; 


358 


Reverse 


5 ' -cacatccatctgtcatctcctt-3 ' 


22 j 


62 


359 



Table AB . General_screening_panel_vl.4 



Tissue Name 


jRel. Exp.(%) Ag3365, 
j Run 216709759 


Tissue Name 


Rel. Exp.(%) Ag3365, 
Run 216709759 


Adipose 


1 0.0 


Renal ca TK-10 


0.0 


Melanoma* 
Hs688(A).T 


0.0 


Bladder 


0.0 


Melanoma* 
Hs688(B).T 


1 00 


Gastric ca. (liver met.) 
NCI-N87 


0.0 


Melanoma* M14 


! 00 


Gastric ca. KATO III 


0.0 


Melanoma* 


i . o.o 


Colon ca. SW-948 


0.0 
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LOXIMVI 






Melanoma* SK- 
JVLbLo 


0.0 


Colon ca. SW480 


0.0 


Squamous cell 
carcinoma 


0.0 


Colon ca* (SW480 

met) SW620 


0.0 


1 estis rOOi 


1 n 7 


Colon ca. HT29 


0.0 


Prostate ca.* (bone 
met) PC-3 

1 11V> t I X J 


0.0 


Colon ca. HCT-116 


0.0 


Prostate Pool 


0.0 


Colon ca. CaCo-2 


0.0 


rlacenta 


A A 

U.U 


Colon cancer tissue 


0.0 


Uterus Pool | 


0.0 


Colon ca. SW1116 


0.0 


Ovarian ca. 
OVCAR-3 


0.0 


Colon ca Colo-205 


0 0 


Ovarian ca. SK-OV- j 
3 


4.9 


Colon ca SW-48 


0.0 


Ovarian ca. 
OVCAR-4 


0.0 


Colon Pool 


0 0 


Ovarian ca. 
OVCAR-5 ; 


0.0 


Small Tnte^Hnp Pnnl 

kJiiiaii .unwound j. uui 


0 0 


Ovarian ca. IGROV- 

1 
1 


7.9 


Stomach Pool 


0.0 


Ovarian ca. 
OVCAR-8 


26.8 


Bone Marrow Pool 


0.0 


Ovary 


U.U 


Fetal Heart 


0.0 


breast ca. Mtr -7 


U.U 


Heart Pool 


0.0 


rJreast ca. MDA- 
MB-231 


1.7 


Lymph Node Pool 


16.5 


Breast ca BT 549 


0.0 


Fetal Skeletal Muscle 


0.0 


Breast ca. T47D I 


0.0 


Skeletal Muscle Pool 


0.0 


Breast ca. jvlua-in 


A A 

U.U 


Spleen Pool 


0.0 


Breast Pool 


0.0 


Thymus Pool 


0.0 


Trachea 


0.0 


CNS cancer (glio/astro) 
U87-MG 


0 0 


Lung 


0.0 


CNS cancer (glio/astro) 
U-118-MG 


0 0 

V/.V 


Fetal Lung 


0.0 


CNS cancer 
(neurojmet) SK-N-AS 


0 0 


Lungca.NCI-N417 j 


0.0 


CNS cancer (astro) SF- 
539 


0 0 


Lung ca. LX-1 


3.3 


CNS cancer (astro) 
SNB-75 


0.0 


Lungca.NCI-H146 


4.5 


CNS cancer (glio) 
SNB-19 


6.2 


Lung ca. SHP-77 


0.0 


CNS cancer (glio) SF- 
295 


25.7 
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Lung ca. A549 


0.0 


Brain (Amygdala) Pool 


0.0 


Lung ca. iNCl-ri^zo 


n n 


Drain ^cereDeiiuinj 


U.U 


Lung ca. NCI-rtZi 


1 aa a 
lUU.U 


orain ^ietaij 


U.U 


Lung ca. NCI-H460 


0.0 


Brain (Hippocampus) 

rOOl 


4.8 


Lung ca. HOP-62 


A A 


Cerebral cortex rool 


A A 
U.U 


Lung ca. NCI-H522 


0.0 


Brain (Substantia nigra) 
Pool 


1.8 


Liver 


0.0 


Brain fThalamus^ Pool 


3.6 


Fetal Liver 

A VlOl X^l V vl 


0.0 


Brain fwhole^ 


6.9 


Liver ca HenCr2 


0.0 


Sninal Cord Pool 


0.0 


TCidnev Pool 

IVlUilv J X \J\J I 


0.0 


Adrenal Oland 


0.0 


Fetal fCidnev 


0.0 


Pituitarv eland Pool 


0.0 


Renal ca. 786-0 


0.0 


Salivary Gland 


0.0 


Renal ca. A498 


0.0 


Thyroid (female) 


0.0 


Renal ca. ACHN 


0.0 


Pancreatic ca. CAPAN2 


0.0 


Renal ca. UO-31 


0.0 


Pancreas Pool 


0.0 



CNS_neurodegeneration_vl.O Summary: Ag3365 - Expression of this gene is 
low/undetectable (CTs > 35) across all of the samples on this panel (data not shown). 



General_screening_panel_vl.4 Summary: Ag3365 - Significant expression of this gene is 
seen only in the lung cancer cell line NCI-H23 (CT=33.1). Therefore, expression of this gene 
may be used to distinguish this sample from the other samples on this panel. 

Panel 4D Summary: Ag3365 - Expression of this gene is low/undetectable (CTs > 35) across 
all of the samples on this panel (data not shown). 

B. CG58520-01: GAMMA-AMINOBUTYRIC-ACID RECEPTOR GAMMA-1 

Expression of gene CG58520-01 was assessed using the primer-probe set Ag3364, 
described in Table BA. 



Table BA . Probe Name Ag3364 



Primers! 


Sequences 


Length 


Start 
Position 


SEQID 
NO: 


Forward 


5 ' -ttcttctgcggagtcaaagtag-3 ' 


22 


43 


360 


Probe 


TET-5 ' -ttggtcttcttgttactgaccctgca-3 ' - 
TAMRA 


26 


75 


361 


Reverse 


5 ' -tcatctgccttatcaacgtttc-3 ' 


22 


106 


362 
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CNS_neurodegeneration_vl.O Summary: Ag3364 - Expression of this gene is 
low/undetectable (CTs > 35) across all of the samples on this panel (data not shown). 

General_screening_panel_vl.4 Summary: Ag3364 - Expression of this gene is 
low/undetectable (CTs > 35) across all of the samples on this panel (data not shown). 

Panel 4D Summary: Ag3364 - Expression of this gene is low/undetectable (CTs > 35) across 
all of the samples on this panel (data not shown). 

Panel CNS_1 Summary: Ag3364 - Expression of this gene is low/undetectable (CTs > 35) 
across all of the samples on this panel (data not shown). 

C CG58520-03: GAMMA-AMINOBUTYRIC-ACID RECEPTOR GAMMA-1 
SUBUNIT PRECURSOR (GABA(A) RECEPTOR) 

Expression of gene CG5i8520-03 was assessed using the primer-probe set Ag5092, 
described in Table CA. 

Table CA . Probe Name Ag5092 



Primersj Sequences 


Length j 


Start 
Position 


SEQID 
NO: 


Forwardjs * -gaacattcctgtccactgga-3 ' 


20 | 


625 


363 


p , lTET-5 ' -attttcaagcgatggataccctaaaa-3 • - 
Fr0De Jtamra 


26 | 


645 


364 


Reverse ]s ' -cacttctacggagggctttt-3 1 


20 j 


692 


365 



CNS_neurodegeneration_vl.O Summary: Ag5092 - Expression of this gene is 
low/undetectable (CTs > 35) across all of the samples on this panel (data not shown). 



General_screening_panel_vl.5 Summary: Ag5092 - Expression of this gene is 
low/undetectable (CTs > 35) across all of the samples on this panel (data not shown). 

Panel 4.1D Summary: Ag5092 - Expression of this gene is low/undetectable (CTs > 35) 
across all of the samples on this panel (data not shown). 

D. CG58518-01: GAMMA-AMINOBUTYRIC-ACID RECEPTOR RHO-3 - 
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Expression of gene CG58518-01 was assessed using the primer-probe sets Ag3363, 
Agl 130, Agl 198, Agl253 and Agl603, described in Tables DA, DB, DC, DD and DE. 
Results of the RTQ-PCR runs are shown in Tables DF, DG and DH. 



Table DA , Probe Name Ag3363 



Primers 


Sequences jLength 


Start 
Position 


SEQID 
NO: 


Forwardp * -tggctttccagttagtctcctt-3 ' j 


22 


14 


366 


Probe 


TET-5 1 -cacctacatctggatcatattgaaacca-3 ' - j 
TAMRA j 


28 


36 


367 


Reverse 


5 1 -ttgatgttagaagcagcacaaa-3 ' j 


22 ! 


68 


368 


Table DB. Probe Name Agl 1 30 


Primers 


Sequences 


Length 


Start 
j Position 1 


SEQID 
NO: 


Forward 


5 ' -gtcctggctttccagttagtct-3 ' 


22 


3 io ! 


369 


Probe 


TET-5 1 -tcacctacatctggatcatattgaaacca-3 1 - 
TAMRA 


29 


1 35 


370 


Reverse 


5 1 -ttgatgttagaagcagcacaaa-3 1 


22 


| 68 


371 


Table DC. Probe Name Agl 198 


Primers 


Sequences 


Length 


j Start 
j Position ! 


SEQID 
NO: 


Forward 


5 ' -gtcctggctttccagttagtct-3 ' 


22 


1 io 1 




372 


Probe 


TET-5 ' -tcacctacatctggatcatattgaaacca-3 ' - 
TAMRA 


29 


35 | 

L _ 




373 


Reverse \ 


5 ■ -ttgatgttagaagcagcacaaa-3 1 


22 


1 68 ! 


374 


Table DD. Probe Name Agl 253 


Primers 


Sequences 


L Start 
Length _ ... 
* Position 




SEQID 
NO: 


Forwardjs • -atctgggtgcctgatatctttt-3 ' 


j .22 


j 466 




375 


Probe 


TET-5 ' -tgtccactctaaaagatccttcatccatga-3 ' - 
TAMRA 


| 30 


j 489 




376 


Reverse 


5 * -cgcagcatgatattctccatag-3 1 


| 22 


j 524 




377 


Table DE. Probe Name Agl603 


Primers 


Sequences 


Length 


Start 
Position 


SEQID 
NO: 
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Forward 


5 ' -gtcctggctttccagttagtct-3 * 


22 1 


10 


i 378 


Probe 


TET-5 * -tcacctacatctggatcatattgaaacca-3 ' - 
TAMRA 


29 | 


35 


j . 379 


Reverse 


5 1 -ttgatgttagaagcagcacaaa-3 ' 


22 j 


68 


3 380 



Table DF . General_screeningjpanel_vl.4 



Tissue Name 


Rel. Exp.(%) Ag3363, 
KUD Zlo/UySSy 


Tissue Name 


Rel. Exp.(%) Ag3363, 
Kun Zlo7U5o5y 


Adipose 


0.0 


Renal ca.TK-10 


0.0 


Melanoma* 

TJ^/TOO/ A \ rp 

rlSOoo(A). 1 


0.0 


Bladder 


6.6 


Melanoma* 
rlso&o(iiJ. I 


0.0 


Gastric ca. (liver met.) 
jNUi-rso / 


0.0 


Melanoma* M14 


A A 

0.0 


Oastnc ca. KA1U 111 


A A 
0.0 


Melanoma* 
LOXIMVI 


0.0 


Colon ca. SW-948 


0.0 


Melanoma* SK- 
MEL-5 


0.0 


Colon ca. SW480 


0.0 


Squamous cell 
carcinoma SC*C-4 

vui vii lvalue* uvv "t 


0.0 


Colon ca.* (SW480 
metl SW620 

1 111' if O TV \JAm\J 


0.0 


Testis Pool 


16.7 


Colon ca. HT29 


0.0 


Prostate ca.* (bone 
met) PC-3 


0.0 ! 


Colon ca.HCT-1 16 


0.0 


rrostate rool 


0.0 


Colon ca. CaLo-z 


A A 

0.0 


Placenta 


0.0 


Colon cancer tissue 


0.0 


Uterus Pool 


0.0 


Colon ca. SW1 116 


0.0 


Ovarian ca. 
OVCAR-3 


0.0 


Colon ca. Colo-205 


0.0 


Ovarian ca. SK-OV- j 

i 
3 


0.0 


Colon ca. SW-48 


0.0 


Ovarian ca. 
OVCAR-4 


0.0 


Colon Pool 


0.0 


uvarian ca. 
OVCAR-5 


0.0 


Small Intestine Pool 


0.0 


Ovarian ca. IGROV- 
1 


0.0 


Stomach Pool 


0.0 


Ovarian ca. 
OVCAR-8 


0.0 


Bone Marrow Pool 


0.0 


Ovary 


0.0 


Fetal Heart 


0.0 


Breast ca. MCF-7 


0.0 


Heart Pool 


0.0 


Breast ca. MDA- 
MB-231 


0.0 


Lymph Node Pool 


6.8 


Breast ca. BT 549 


0.0 


Fetal Skeletal Muscle 


0.0 
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Breast ca. T47D 


6.4 


Skeletal Muscle Pool 


0.0 


Breast ca. MDA-N 


o n 


Spleen Pool 


o ^ 

OJ 


Breast Pool 


0.0 


Thymus Pool 


0.0 


Trachea 


0.0 


CNS cancer (glio/astro) 
U87-MG 


0.0 


Lung 


0.0 


CNS cancer (glio/astro) 
U-118-MG 


10.9 


Fetal Lung 


0.0 


CNS cancer 
(neuro;met) SK-N-AS 


0.0 


Lung ca. NCI-N417 1 


0.0 


CNS cancer (astro) SF- 
539 


0.0 


Lung ca. LX-1 


0.0 


CNS cancer (astro) 
SNB-75 


0.0 


Lung ca. NCI-H146 


77.9 


CNS cancer (glio) 
SNB-19 


0.0 


Lung ca. SHP-77 


100.0 


CNS cancer (glio) SF- 
295 


11.4 


Lung ca. A549 


10.1 


Brain (Amygdala) Pool 


0.0 


Lung ca. INLl-rijZo ; 


a a 


Brain (cerebellum) 


A A 
0.0 


Lung ca. INLl-rizJ 


1A A 

34.4 


Brain (tetal) 


A A 
0.0 


Lungca.NCI-H460 j 


30.6 


Brain (Hippocampus) 

Tl— .1 

Pool 


0.0 


Lung ca. HOP-62 


A A 

0.0 


Cerebral Cortex Pool 


0.0 


Lung ca. NCI-H522 


0.0 


Brain (Substantia nigra) 
Pool 


0.0 


Liver 


0 0 


Rrain rThalamu*;^ Pool 




Fetal Liver • 


0 0 


Rrain Avholp^ 


50 0 

Jv.v 


Liver ca HenCi2 


0 0 


Stiinal Cord Pool 


0 0 


TCidnev Pool 


3 0 


Adrpnal frland 

VJ1CU1U 


0 0 


Fetal Kidnev > 


8 4 


Pitiiitarv oland Pool 

x iiuitui y gicuiu r uui 


0 0 


Renal ca. 786-0 j 


0.0 


Salivary Gland 


0.0 


Renal ca. A498 


0.0 


Thyroid (female) 


0.0 


Renal ca. ACHN 


0.0 


Pancreatic ca. CAPAN2 


0.0 


Renal ca.UO-31 


0.0 


Pancreas Pool 


0.0 



Table DG. Panel 1.2 



Tissue 
Name 


Rel. 

Exp.(%) ; 

Agll30, \ 

Run 
125117140 


Rel. 
Exp.(%) 
Agll30, 

Run 
126566764 


Rel. 
Exp.(%) 
Agll98, 

Run 
129140506 


Tissue 
Name 


Rel. 

Exp.(%) 
Agll30, 

Run 
125117140 


Rel. 
' Exp.(%) 
Agll30, 1 

Run 
126566764 


Rel. 
Exp.(%) 
Agll98, 

Run 
129140506 


Endothelial 
cells 


o.o 1 


0.0 


0.0 


Renal ca. 
786-0 


0.0 


0.0 


0.0 
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Heart 
(Fetal) 


0.0 


0.0 


0.0 


Renal ca. 
A498 


7.3 


4.7 


0.0 


Pancreas 


0.0 


0.0 


0.0 


Renal ca. 
RXF393 


0.0 


0.0 


0.0 


Pancreatic 
ca. CAP AN 
2 


9.0 


0.0 . 


0.0 


Renal ca. 
ACHN 


0.0 


0.0 


0.0 


Adrenal 
Gland 


0.0 


2.6 


0.0 


Renal ca. 
UO-31 


3.9 


0.0 


0.0 


Thyroid 


0.0 


0.0 


0.0 


Renal ca. \ 
TK-10 i 


0.0 


0.0 


0.0 


Salivary 
gland 


0.0 


0.0 


0.0 


Liver 


26.6 


0.0 


0.0 


Pituitary 
gland 


0.0 


0.0 


0.0 


Liver j 
(fetal) 


25.3 


0.0 


0.0 


Brain 
(fetal) 


0.0 


0.0 


0.0 


Liver ca. 
(hepatobla 
st) HepG2 


0.0 


0.0 


0.0 


Brain 
(whole) 


2.6 


20.0 


0.0 


Lung 


0.0 


0.0 


0.0 


Brain 

(amygdala) 


1.3 


32.1 


0.0 


Lung 

(fetal) i 


0.0 


0.0 


0.0 


Brain 

(cerebellum 

) 


1.5 


3.8 


0.0 


Lung ca. 
(small 
cell) LX-1 


3.4 


0.0 


0.0 


Brain 

(hippocamp 
us) 


0.0 j 


27.0 


0.0 


Lung ca. 
(small 
cell) NCI- i 
H69 


28.5 


74.2 


0.0 


Brain 
(thalamus) 


9.9 j 


22.5 


9.8 


Lung ca. 
(s.cell 
var.) 
SHP-77 


3.8 


9.7 


0.0 


Cerebral 
Cortex 


0.0 


0.0 


0.0 


Lung ca. ! 
(large 
cell)NCI- 
H460 


8.8 


4.1 


5.3 


Spinal cord 


4.4 


0.0 


0.0 


Lung ca. 
(non-sm. 
cell) A549 


51.4 


9.5 


7.2 


glio/astro 
U87-MG 


0.0 \ 


0.0 


0.0 


Lung ca. 
(non- 
s.cell) 
NCI-H23 


0.0 


0.0 


0.0 


glio/astro 
U-118-MG 


0.0 


0.0 


0.0 


Lung ca. 

(non- 

s.cell) 


8.4 


2.7 


9.6 
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HOP-62 








astrocytom 
aSW1783 


2.9 


0.0 


0.0 


Lung ca. 
(non-s.cl) 
NCI- ; 
H522 


0.0 


0.0 


0.0 


neuro*; met 
SK-N-AS 


0.0 


0.0 


0.0 


Lung ca. 
(squam.) \ 
SW 900 


3.2 


8.7 


0.0 


astrocytom 
a SF-539 


5.1 


0.0 


0.0 


Lung ca. ■ 
(squam.) ; 
NCI- ] 
H596 


2.3 


15.9 


0.0 


astrocytom 
a SNB-75 


2.3 


0.0 


0.0 


Mammary; 
gland 


0.0 


0.0 


0.0 


glioma 
SNB-19 


6.3 


20.7 


9.0 


Breast 
ca.* i 
(plef) : 
MCF-7 ; 


0.0 


0.0 


0.0 


glioma 
U251 


1.4 


0.0 


1.8 


Breast 
ca.* 
(pl.ef) 
MDA- 
MB-231 : 


0.0 


0.0 


0.0 


glioma SF- 
295 


0.0 


0.0 


0.0 


Breast 
ca.* (pi. 
ef)T47D ; 


14.1 


37.4 


0.0 


Heart 


0.0 


0.0 


0.0 


Breast ca. 
BT-549 


12.5 


21.0 


12.3 


Skeletal 
Muscle 


2.3 j 


0.0 


0.0 


Breast ca. ; 
MDA-N \ 


0.0 


0.0 


0.0 


Bone 
marrow 


0.0 


0.0 


0.0 


Ovary 


0.0 


0.0 


0.0 


Thymus 


0.0 i 


0.0 


0.0 


Ovarian 
ca. ; 
OVCAR- \ 
3 


0.0 


0.0 


0.0 


Spleen 


2.2 


0.0 


0.0 


Ovarian 
ca. 

OVCAR- 
4 


0.0 


0.0 


0.0 


Lymph 
node 


0.0 


0.0 


0.0 


Ovarian 
ca. 

OVCAR- : 
5 


66.9 


35.4 


4.4 


Colorectal 
Tissue 


11.3 


27.7 : 


21.8 


Ovarian 
ca. 

OVCAR- 


2.7 


0.0 


0.0 
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8 








Stomach 


0.0 


0.0 


0,0 


Ovarian 
ca. 

IGROV-1 j 


6.0 


0.0 


0.0 


Small 
intestine 


5.4 


0.0 


0.0 


Ovarian » 
ca. j 
(ascites) : 
SK-OV-3 


30.8 


0.0 


0.0 


Colon ca. 
SW480 


3.2 


0.0 


0.0 


Uterus 


0.0 


0.0 


0.0 


Colon ca * 
SW620 
(SW480 
met) 


0.0 | 


0.0 


0.0 


Placenta 


0.0 


0.0 


0.0 


Colon ca. 
HT29 


1.9 


14.4 


0.0 


Prostate 


6.9 


0.0 


0.0 


Colon ca. 
HCT-116 


0.0 


0.0 


0.0 


Prostate 
ca.* (bone 
met) PC-3 j 


100.0 


0.0 


0.0 


Colon ca. 
CaCo-2 


0.0 


0.0 


0.0 


Testis 


54.7 


100.0 


36.9 


Colon ca. 

Tissue 

(OD03866) 


72.2 


75.8 : 


100.0 


Melanom 
a 

Hs688(A). 
T 


4.2 


0.0 


0.0 


Colon ca. 
HCC-2998 


5.3 


4.8 


0.0 


Melanom 
a* (met) 
Hs688(B)J 
T 


2.7 


34.2 


13.3 


Gastric ca.* 
(liver met) 
NCI-N87 


50.3 


0.0 


0.0 


Melanom • 

aUACC- 

62 


0.0 


0.0 


0.0 


Bladder 


6.0 


22.1 


0.0 


Melanom 
aM14 


31.4 


36.3 


20.2 


Trachea 


0.0 


0.0 


0.0 


Melanom 

aLOX 

IMVI 


0.0 


0.0 


0.0 


Kidney 


2.0 j 


0.0 


0.0 


Melanom 
a* (met) 
SK-MEL- 
5 


2.4 


0.0 


0.0 


Kidney 
(fetal) 


1.1 


2.5 


0.0 









Table PH. Panel 4R 



Tissue Name 



Rel. Exp.(%) 



Tissue Name 
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Act11Q£ Pirn 

142014937 




AollQft Pun 

142014937 


Secondary Thl act 


u.u 


T-TT R/PT P TT 1 kAta 


n n 
u.u 


oeconaary inz act 


u.u 


T-TT TVFP TFNI osmms 
flu VIZfV/ LT IN gdllnlla 




Secondary Trl act 


2.5 


T-TT A/T7P TXTTT alrtlia 4- TPTsJ 

nu v tiv_ i iNr aipna » utin 
gamma 


0.0 


Secondary Thl rest 


u.u 


T-TT TVPP TNTE 1 alnVio -4- TT A 


u.u 


Secondary Th2 rest 


0.0 


HUVECIL-11 


0.0 


Secondary Trl rest 


0.0 


Lung Microvascular EC 
none 


0.0 


Primary Thl act 


0.0 


Lung Microvascular EC 
TNFalpha + IL-lbeta 


0.0 


Primary Th2 act i 


0.0 


Microvascular Dermal EC j 
none 


0.0 


Primary Trl act 


0.0 


Microsvasular Dermal EC 
TNFalpha + IL-lbeta 


0.0 


Primary Thl rest 


0.0 


Bronchial epithelium 
TNFalpha + ILlbeta 


0.0 


Primary Th2 rest 


0.0 


Small airway epithelium 
none 


0.0 


Primary Trl rest 


0.0 


Small airway epithelium 
TNFalpha + IL-lbeta 


0.0 


CD45RA CD4 
lymphocyte act 


0.0 


Coronery artery SMC rest j 


0.0 


CD45RO CD4 
lymphocyte act 


0.0 


Coronery artery SMC 
i INr aipna t il,- i oeta 


0.0 


CD8 lymphocyte act 


0.0 


Astrocytes rest j 


0.0 


Secondary CD8 
lymphocyte rest 


0.0 


Astrocytes TNFalpha + 
IL-lbeta 


0.0 


Secondary CD 8 
lymphocyte act 


0.0 


KU-8 12 (Basophil) rest 


0.0 


CD4 lymphocyte none 


0.0 


KU-8 12 (Basophil) 
PMA/ionomycin 


0.0 


2ry Thl/Th2/Trl_anti- 


0.0 


CCD 11 06 (Keratinocytes) 
none 


0.0 


LAK cells rest 


0.0 


v^y^D 1 1 uo ^jveraiinocyies ) 
TNFalpha + IL-lbeta 


0.0 


TAT/' *+**Mo TT O 
LfAlv CeilS LL-z 


u.u 


Liver cirrhosis 


1 £ A 
1 0.4 


T AV ^aIIc TT O-i-TT 10 

LAK cells LL-z+lL-lz 


ft ft 
U.U 


Lupus kidney 


ft ft 

u.u 


LAK cells IL-2+IFN 
gamma 


0.0 


NCI-H292 none 


0.0 


T AV relic TT -9+ TT -1 8 


U.U 


NPT-T-T9Q9 TT -4 


0 0 


LAK cells 
PMA/ionomycin 


0.0 


NCI-H292 IL-9 


0.0 


NK Cells EL-2 rest 


0.0 


NCI-H292 IL-13 


0.0 
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i wo w ay ivii^iv j day 




XN\^1"X1X7X J-T IN gallUUa 


0 0 


i wo w ay ivijlfIv d aay 






0 0 


Two Way MLR 7 day 


0.0 


HPAEC TNF alpha + IL-1 

UCla 


0.0 


r &Ni\s rest i 




i^ung iioruuiaoi nunc 


0 0 


PBMC PWM 


0.0 


idling IlOiUUldol 1 INF aipila 

+ IL-1 beta 


0.0 




n n 
u.u 


idling IlDrODlaSl lLj-*t 




Ramos (B cell) none 


0.0 


Lung fibroblast IL-9 


0.0 


Ramos (B cell) 
ionomycin 


0.0 


Lung fibroblast IL-1 3 


0.0 


B lymphocytes PWM 


0.0 


Lung fibroblast EFN 
gamma 


0.0 


B lymphocytes CD40L ! 
and IL-4 


0.0 


Dermal fibroblast 
CCD 1070 rest 


0.0 


EOL-1 dbcAMP 


0.0 


Dermal fibroblast 
CCD1070 TNF alpha 


0.0 


EOL-1 dbcAMP 
r^ivi/v ionomycin 


0.0 


Dermal fibroblast 


0.0 


Dendritic cells none 


0.0 


Dermal fibroblast EFN 
gamma 


0.0 


uenunuc cens Lro 


n ft 


uemiai iiurouiooi ll-h 


ft ft 


ucnunuc ecus anil- 
CD40 


0.0 


IBD Colitis 1 


100.0 


Monocytes rest 


0.0 


IBD Colitis 2 


0.0 


Monocytes LPS 


0.0 


IBD Crohn's 


0.0 


Macrophages rest 


0.0 


Colon 


0.0 


Macrophages LPS 


0.0 


Lung 


0.0 


HUVEC none 


0.0 


Thymus 


0.0 


HUVEC starved 


0.0 


Kidney 


0.0 



CNS_neurodegeneration_vl.O Summary: Ag3363 - Expression of this gene is 
low/undetectable (CTs > 35) across all of the samples on this panel (data not shown). 



General screening panel vl.4 Summary: Ag3363 - Significant expression is seen in lung 
cancer cell line NCI-H146 (CT=34.5) and lung cancer cell line SHP-77 (CT=34.2). Therefore, 
expression of this can be used to distinguish these samples from the rest of the samples on this 
panel. 

Panel 1.2 Summary: Agl 130/Agl 198 - Three different runs using the same primer 
sequences yield similar results. Significant expression of this gene is seen in testis and a colon 
cancer sample. Therefore, expression of this gene can be used to differentiate these samples 
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from other samples on these panels. Results from a third experiment using the probe and 
primer set Agl253 show low/undetectable levels of expression in all the samples on this panel. 

Panel 1.3D Summary: Agl253 - Expression of this gene is low/undetectable (CTs > 35) 
across all of the samples on this panel (data not shown). 

Panel 2D Summary: Agl603 - Expression of this gene is low/undetectable (CTs > 35) across 
all of the samples on this panel (data not shown). 

Panel 4D Summary: Agll30/Agl 198/Agl253/Ag3363 - Two experiments showed possible 
experimental difficulties, while the other three runs showed expression of this gene as 
low/undetectable (CTs > 35) across all of the samples on the panel. 

Panel 4R Summary: Agl 198 - Significant expression of this gene is seen only in the IBD 
colitis 1 sample (CT=34.2). Therefore, expression of this gene can be used to differentiate this 
sample from others on the panel. 

Panel CNS_1 Summary : Agl253/Agl603 - Expression of this gene is low/undetectable 
(CTs > 35) across all of the samples on this panel (data not shown). 

E. CG58516-01 : G-protein beta WD-40 repeats 

Expression of gene CG58516-01 was assessed using the primer-probe set Ag3362, 
described in Table EA. Results of the RTQ-PCR runs are shown in Tables EB and EC. 

Table E A . Probe Name Ag3362 



Primers 


Sequences 


Length 


Start I SEQID 
Position j NO: 


Forward 


5 ' -gtcgggcaggacctttact-3 1 


19 


1474 j 381 


Probe 


TET-5 1 -tcctacagctaattctgcagggcaca-3 1 - 
TAMRA 


26 


1498 | 382 


Reverse 


5 ' -tacgctttactcccgtaagtca-3 1 


22 


1543 j 383 



Table EB . CNS_neurodegeneration_vl.O 



Tissue Name 1 ReK Exp (%) Ag3362 ' 
Tissue Name | Run 2 i 0153738 


Tissue Name 


Rel. Exp.(%) Ag3362, 
Run 210153738 


AD 1 Hippo | 9.9 


Control (Path) 3 
Temporal Ctx 


0.0 
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AD 2 Hippo 


33.2 


[Control (Path) 4 

] i emporai l,ix 


24.3 


ajj j nippo 




j/vl* i uccipitai lax 


o n 
z.U 


AD 4 Hippo 


16.5 


|/vL/ z uccipiiai ^IX 
uMissine) 


0.0 


AD 5 hippo 


96.6 


]AD 3 Occipital Ctx 


5.4 


AD 6 Hippo 


43.2 


|AD 4 Occipital Ctx 


24.7 


L-Oniroi z nippo 




|AL> D uccipitai utx 




Control 4 Hippo 


16.6 


|AD 6 Occipital Ctx 


31.9 


Control (Path) 3 
Hippo 


3.8 


JControl 1 Occipital 
JCtx 


0.9 


AD 1 Temporal Ctx 


7.1 


[Control 2 Occipital 
|Ctx 


89.5 


AD 2 Temporal Ctx 


23.2 


jControl 3 Occipital 
jCtx 


12.6 


AD 3 Temporal Ctx 


5.6 


jControl 4 Occipital 
JCtx 


6.3 


AD 4 Temporal Ctx 


20.0 


IControl (Path) 1 
{Occipital Ctx 


65.1 


AD 5 Inf Temporal 
Ctx 


100.0 


jControl (Path) 2 
jOccipital Ctx 


15.8 


AD 5 SupTemporal 
Ctx 


43.8 


IControl (Path) 3 
jOccipital Ctx 


2.0 


AD 6 Inf Temporal 
Ctx 


30.8 


Control (Path) 4 
(Occipital Ctx 


11.6 


AD 6 Sup Temporal 
Ctx 


69.7 


jControl 1 Parietal 
[Ctx 


2.8 


Control 1 Temporal 
Ctx 


9.0 


JControl 2 Parietal 
|Ctx 


39.2 


Control 2 Temporal ; 
Ctx 


59.0 


IControl 3 Parietal 
jCtx 


23.5 


Control 3 Temporal 
Ctx 


11.7 


""■ i *"■"• 

Control (Path) 1 
JParietal Ctx 


69.7 


Control 4 Temporal 
Ctx 


8.2 


IControl (Path) 2 
Parietal Ctx 


14.9 


Control (Path) 1 
Temporal Ctx 


56.3 


Control (Path) 3 
JParietal Ctx 


0.9 


Control (Path) 2 
Temporal Ctx 


34.2 


Control (Path) 4 
JParietal Ctx 


38.7 



Table EC . General_screening__panel_vl.4 



Tissue Name 


Rel. Exp.(%) Ag3362, j 

Run 216523482 j 11SSUe Wame 


Rel. Exp.(%) Ag3362, 
Run 216523482 


Adipose 


6.3 jRenal ca. TK-10 


44.1 
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Melanoma* 
risooo^/vj. i 


17.6 


Bladder 


9.4 


Melanoma* 
rlSOoo(t>J. 1 


18.3 


Gastric ca. (liver met.) 

XTpT XTR7 


21.6 


Melanoma* M14 


17.1 


Gastric ca. KATO III 


17.6 


Melanoma* 
LOXIMVI 


13.6 


Colon ca. SW-948 


5.8 


Melanoma* SK- 


19.6 


Colon ca. SW480 


34.6 


Squamous cell 
carcinoma oL^-*f 


14.6 


Colon ca.* (SW480 

mptl QWA?ft 


14.2 


lestis rOOl 


A ft 


v_-oion ca. n i 


7 7 


rro state ca. ^oone 
met) PC-3 


90.8 


Colon ca.HCT-1 16 


14.3 


Prostate Pool 


4.0 


Colon ca. CaCo-2 


19.8 


Dl aaam4a 

riacenta 


1 1 A 


L/Oion cancer tissue 




Uterus Pool 


2.1 


Colon ca. SW1 116 


9.4 


Ovarian ca. 
OVCAR-3 


17.4 


Colon ca. Colo-205 


8.8 


Ovanan ca. SK-OV- 
3 


47.0 


Colon ca. SW-48 


13.2 


Ovarian ca. j 
OVCAR-4 


14.7 


Colon Pool 


5.7 


Ovarian ca. 
OVCAR-5 


31.6 


Small Intestine Pool 


10.2 


Ovanan ca. IGROV- 

i ! 
i 


12.9 


Stomach Pool 


6.2 


uvanan ca. 
OVCAR-8 


6.7 


Bone Marrow Pool 


1.3 


Ovary \ 


1 O ^ 


reiai riean 


1 1 
1 . 1 


isreast ca. JVLt^r-/ 


/ J.O 


nean .rooi 


1 A 


oreasi ca. jvjjj/v- 
MB-231 


30.4 I 


Lymph Node Pool 


8.7 


Breast ca. BT 549 


65.5 


Fetal Skeletal Muscle 


2.3 


Breast ca. T47D 


100.0 


Skeletal Muscle Pool 


9.4 


oreasi ca. iylu/v-jn < 


11 A 


opieen rooi 


A fi 


Breast Pool 


4.6 


Thymus Pool 


7.3 


Trachea 


7.7 


CNS cancer (glio/astro) 
U87-MG 


33.9 


Lung 


4.9 


CNS cancer (gho/astro) 
U-118-MG 


27.2 


Fetal Lung 


7.1 


CNS cancer 
(neuro;met) SK-N-AS 


16.0 


Lungca. NCI-N417 


9.3 


CNS cancer (astro) SF- 
539 


14.3 
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Lung ca. LX-1 


15.8 


CNS cancer (astro) 
SNB-75 


6(X7 


Lung ca. NCI-H146 


4.9 


CNS cancer (glio) 


13.8 


Lung ca. SHP-77 


16.5 


CNS cancer f elio^ SF- 
295 


28.5 


Lung ca. A549 


27.2 


Brain (Amygdala) Pool 


5.3 


L/Ung ca. iN^i-rijzo 


A 1 


T5t*Qin ( r , #»t**>V»AiliiTTi l 
JDIalll ^UCICUCJJUlll^ 




T una >JPT-H?^ 
i^UIlg ta, IN Ks\ri\L3 




.Dldlil ^ICLlllJ 


16 4 


Lung ca. NCI-H460 


9.5 


Brain (Hippocampus) 

ruui 


5.5 


i^ung td. nur *uz 


7 f\ 


v^wiwuidi vajiiva ruui 


8 7 


Lungca.NCI-H522 j 


18.2 


XJi alii i L/UUoLtuiiia liic^icw 

Pool 


8.3 


Liver 


0.0 


Brain (Thalamus) Pool 


6.3 


Fetal Liver 


7.3 j 


Brain (whole) 


7.0 


Liver ca. HepG2 


29.5 


Spinal Cord Pool 


5.6 


Kidney Pool 


17.7 


Adrenal Gland 


6.3 


Fetal Kidney 


4.6 


Pituitary gland Pool 


0.8 


Renal ca. 786-0 


17.2 


Salivary Gland 


5.6 


Renal ca. A498 


5.1 


Thyroid (female) 


9.7 


Renal ca. ACHN 


17.3 


Pancreatic ca. CAPAN2 


11.7 


Renal ca.UO-31 


11.1 


Pancreas Pool 


9.2 



CNS_neurodegeneration_vl.O Summary: Ag3362 Highest expression of the CG58516-01 
gene is seen in the occipital cortex of a control patient and the temporal cortex of an 
Alzheimer's patient. While the CG585 16-01 gene does not appear to be preferentially 
expressed in Alzheimer's disease, this panel confirms expression of the CG585 16-01 gene at 
moderate/high levels in the brain in an additional set of individuals. Please see Panel 1 .4 for 
discussion of potential utility of this gene in the central nervous system. 

General_screening_panel_vl.4 Summary: Ag3362 The CG58516-01 gene is widely 
expressed in this panel, with highest expression in the breast cancer cell line T47D (CT=29). 
Significant expression is also seen in cell lines derived from prostate, breast and ovarian 
cancers. In general, expression of the CG58516-01 gene appears to be greater in the cancer 
cell lines than in normal tissue. Thus, the expression of this gene could be used to distinguish 
these cell line types from others in the panel. 

Among tissues involved in central nervous system function, this gene is expressed at 
low but significant levels in all brain regions examined. This gene encodes a protein with a 
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putative zinc-finger motif. Since these proteins are known to interact with nucleic acids, this 
suggests that this gene product may play a potential role in transcription. Thus, therapeutic 
modulation of the CG58516-01 gene product may be used to regulate the transcription of 
disease-related proteins such as ataxin, huntingtin, or various apoptosis cascade proteins. 

Among tissues with metabolic function, this gene is expressed at low levels in 
pituitary, adipose, adrenal gland, pancreas, thyroid, skeletal muscle, heart, and fetal liver. This 
widespread expression among these tissues suggests that this gene product may play a role in 
normal neuroendocrine and metabolic and that disregulated expression of this gene may 
contribute to neuroendocrine disorders or metabolic diseases, such as obesity and diabetes. 

References: 

1. Zhu W, Chan EK, Li J, Hemmerich P, Tan EM. (2001) Transcription activating 
property of autoantigen SG2NA and modulating effect of WD-40 repeats. Exp Cell Res. 
269(2):3 12-21 

Panel 4D Summary: Ag3362 Results from one experiment with the CG585 16-01 gene are 
not included because the amp plot corresponding to the run indicates that there were problems 
with the experiment. 

F. CG58473-01: PROTEIN KINASE 

Expression of gene CG58473-01 was assessed using the primer-probe set Ag3357, 
described in Table FA. Results of the RTQ-PCR runs are shown in Tables FB and FC. 

Table FA . Probe Name Ag3357 



Primers 


Sequences 


Length 


Start 
Position 


SEQ ID 
NO: 


Forward 


5 ' -gtcaaggtggccctaaaattc-3 ' 


21 


853 


384 


Probe 


TET-5 ' -ccaggacctcatctccaagctgctta-3 ' - 
TAMRA 


26 


897 


385 


Reverse 


5 ' -agccgttctgaggggttat-3 ' 


19 ; 


926 


386 



Table FB . General_screening_panel_vl.4 



Tissue Name 



Rel. Exp.(%) Ag3357, 
Run 216523477 



Tissue Name 



Rel. Exp.(%) Ag3357, 
Run 216523477 
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Adipose 


0.0 


Renal ca. TK-10 


13.2 


Melanoma* 
Hso88(A).l 


0.0 


Bladder 


7.2 


Melanoma* 
rlSOoo(t>J. 1 


1.1 


Gastric ca. (liver met.) 

"MPT NTS 7 


5.4 


Melanoma* Ml 4 


50.0 


Gastric ca. KATO HI 


49.0 


Melanoma* 
LOXIMVI 


33.0 i 


Colon ca. SW-948 


14.9 


Melanoma* SK- 


24.7 


Colon ca. SW480 


95.9 


Squamous cell 
carcinoma 


11.6 


Colon ca* (SW480 

met) oWozU 


53.6 


-,4.1 — T% _ 1 

Testis Fool 


o.z 


Colon ca. ri 1 Zy 


1 C\ 1 
IU.j 


rrostate ca.* (bone 
met) PC-3 


3.2 


Colon ca. HCT-116 


76.3 


Prostate Pool 


0.0 


Colon ca. CaCo-2 


14.1 


Placenta 


LA 


co ion cancer tissue 


0.-) 


Uterus Pool 


0.0 


Colon ca.SWl 116 


18.6 


Ovarian ca. 
OVCAR-3 


51.1 


Colon ca. Colo-205 


24.3 


Ovarian ca. SK-OV- 
3 


53.2 


Colon ca. SW-48 


26.1 


Ovarian ca. 
OVCAR-4 


10.4 


Colon Pool 


4.6 


Ovarian ca. 
OVCAR-5 


12.3 


Small Intestine Pool 


1.7 


Ovarian ca. IGROV- 
l 


10.1 


Stomach Pool 


1.2 


Ovarian ca. 
OVCAR-8 


13.4 


Bone Marrow Pool 


1.0 


Ovary * 


U.U 


r etai Heart 


U.U 


Breast ca. MLr-/ 


on 1 
zU.J 


rieart rooi 


U.U 


Breast ca. JVUJA- 
MB-231 


65.1 


Lymph Node Pool 


1.4 


Breast ca BT 549 


100.0 


Fetal Skeletal Muscle 


0.0 


Breast ca. T47D 


34.2 


Skeletal Muscle Pool 


1.6 


Breast ca. MJJA-JN 




bpleen rool 


1 A 

5 A 


Breast Pool 


1.3 


Thymus Pool 


4.7 


Trachea j 


0.0 


CNS cancer (glio/astro) 
U87-MG 


7.8 


Lung 


0.0 


CNS cancer (gho/astro) 
U-118-MG 


54.0 


Fetal Lung 


5.0 


CNS cancer 
(neuro;met) SK-N-AS 


7.9 
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Lung ca.NCI-N417 


17.9 


CNS cancer (astro) SF- 
539 


22.4 


Lung ca. LX-1 


28.5 


CNS cancer (astro) 
SNB-75 


19.2 


Lungca.NCI-H146 


74.7 


CNS cancer (glio) 

OXTD 1 O 


14.6 


Lung ca. SHP-77 


40.6 


LNo cancer (glio) or- 
295 


3.0 


Lung ca. A549 


64.6 


Brain (Amygdala) Pool 


0.0 


Lung ca. JNCl-ri52o 




Brain (cerebellum) 


a a 

0.0 


Lung ca. JNCI-ri2J 


oi. / 


Brain (retal) 


A A 

0.0 


Lung ca. NCI-H460 


0.8 


Brain (Hippocampus) 
rool 


0.0 


Lung ca. HOP-62 


2.0 


Cerebral Cortex Pool 


A A 

0.0 


Lung ca. NCI-H522 


34.4 


Brain (Substantia nigra) 
Pool 


2.6 


Liver 


00 


13 rain fThfllamii^ Pnnl 


9 3 


Fetal Liver 


0.0 


Brain fwhole^ 


2 5 


Liver ca Her>fr2 


11.4 


Sninal Cord Pool 


0 0 


Kidnev Pool 


0.0 


Adrenal Gland 


0 0 


Fetal Kidnev 


3.1 I 


Pituitary gland Pool 


1.4 


Renal ca 786-0 


20.0 


Salivary Gland 


0.0 


Renal ca. A498 




3.6 ! 


Thyroid (female) 


0.0 


Renal ca. ACHN 


18.9 jPancreatic ca. CAPAN2 


20.4 


Renal ca. UO-31 




10.4 j 


Pancreas Pool j 




1.3 


Table FC. Panel 4D 


Tissue Name 


Rel. Exp.(%) 
Ag3357, Run 
165231196 


Tissue Name 


Rel. Exp.(%) 
Ag3357, Run 
165231196 


Secondary Thl act 


9.0 


HUVEC IL-lbeta 


9.5 


Secondary Th2 act 


43.2 


HUVEC IFN gamma 


6.3 


Secondary Trl act 


46.0 


HUVEC TNF alpha + IFN 
gamma 


7.3 


Secondary Thl rest 


6.7 


HUVEC TNF alpha + IL4 


25.3 


Secondary Th2 rest 


12.2 


HUVEC IL-11 


13.1 


Secondary Trl rest 


1.9 


Limg Microvascular EC 
none 


3.3 


Primary Thl act 


6.1 


Lung Microvascular EC 
TNFalpha + IL-lbeta 


7.1 


Primary Th2 act 


21.8 


Microvascular Dermal EC 
none 


10.9 


Primary Trl act 


33.0 


Microsvasular Dermal EC 


7.3 
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1 


TNFalpha + IL-lbeta 




Primary Thl rest j 28.1 


Bronchial epithelium 
TNFalpha + ILlbeta 


1.9 


Primary Th2 rest j 12.1 


Small airway epithelium 
none 


3.6 


Primary Trl rest j 29.7 


Small airway epithelium 
TNFalpha + IL-lbeta 


36.3 


CD45RA CD4 
lymphocyte act 


28.5 


Coronery artery SMC rest 


0.0 


CD45RO CD4 
lymphocyte act 


39.8 


Coronery artery SMC 
I Nr alpha + LL-lbeta 


0.0 


CD8 lymphocyte act 


18.6 


Astrocytes rest 


1.4 


Secondary CD8 
lymphocyte rest 


26.8 


Astrocytes TNFalpha + 
IL-lbeta 


1.2 


Secondary CD8 
lymphocyte act 


19.2 


KU-8 12 (Basophil) rest 


18.2 


CD4 lymphocyte none 


10.6 


KU-8 12 (Basophil) 
PMA/ionomycin 


30.4 


2ry Thl/Th2/Trl_anti- 
CDQS CH1 1 


15.6 


CCD 1 106 (Keratinocytes) 
none 


18.3 


LAK cells rest 


0.0 


ULJJllUo (Keratinocytes) 
TNFalpha + IL-lbeta 


7.7 


LAK cells IL-2 


42.6 


Liver cirrhosis 


25 J 


T rpllc TT -7+TT -1? 




Lupus kidney 


0.0 


T rf»11c TT -?-r-TF>J 

JLr/VTk. L/C115> li^-X "^UTIN 

gamma 


24.8 


NCI-H292 none 


7.8 


T AK rplk TT -?+ TT -1 8 i 




XVT TJOfil TT A 


26.4 


PMA/ionomycin 


0.0 


NCI-H292 IL-9 


29.7 


NK Cells IL-2 rest 


23.5 


NCI-H292 IL-13 


20.7 


i wo w dy ivijLsiv J tidy 


1 j./ 


NUl-rizyz irN gamma 


27.9 


Two Way MLR 5 day ■ 


13.2 


LTD ACP 

rlr AbC none 


8.6 


Two Way MLR 7 day j 


11.7 


HPAEC TNF alpha + IL-1 
□eta 


2.4 


rBML rest 


0.0 


Lung fibroblast none 


5.5 


PBMC PWM 


52.1 


Lung iibroblast TNF alpha 
+ IL-1 beta 


2.2 


PBMC PHA-L 


14.6 


Lung fibroblast IL-4 


0.0 


Ramos (B cell) none 


16.5 


Lung fibroblast IL-9 


0.0 


Ramos (B cell) 
ionomycin 


14.7 


Lung fibroblast IL-13 


0.0 


B lymphocytes PWM 


100.0 


Lung fibroblast IFN 
gamma 


0.0 


B lymphocytes CD40L 
and IL-4 


10.4 


Dermal fibroblast 
CCD 1070 rest 


40.1 
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EOL-l dbcAMP 


9.9 


Dermal fibroblast j 
CCD1070 TNF alpha | 


43.8 


EOL-l dbcAMP 

"DTVyf A /ir\nr\mvpiri 
i lYLTi/ luiiuiiiyuiii 


13.2 


Dermal fibroblast j 
PCD 1 070 IL- 1 beta 


23.5 


Dendritic cells none 


4.7 


Dermal fibroblast IFN j 


3.7 


ucnuniic ecus i^r o 


i i 
l.i 


Plprmal fiViroKlaQt TT -A 1 


4 6 


Ucnunuc ceiis aim- 
CD40 


0.0 


IBD Colitis 2 | 


0.0 


Monocytes rest 


0.0 


IBD Crohn's j 


0.0 


Monocytes LPS 


0.0 


Colon 

. I 


28.1 


Macrophages rest 


4.3 


Lung 1 


59.0 


Macrophages LPS 


0.0 


Thymus j 


0.0 


HUVEC none 


28.3 


Kidney j 


10.0 


HUVEC starved 


25.3 





CNS_neurodegeneration_vl,0 Summary: Ag3357 - Expression of this gene is 
low/undetectable (CTs > 35) across all of the samples on this panel (data not shown). 



General screening panel vl. 4 Summary: Ag3357 This gene is primarily expressed in 
cancer cell lines, with highest expression in a breast cancer cell line BT 549(CT=32.8). This 
gene is expressed in the following cell lines but not the corresponding healthy tissue: gastric, 
brain, colon, lung, breast, ovarian cancer and melanomas. Thus, expression of this gene could 
be used as a diagnostic marker for the presence of these cancers. Furthermore, therapeutic 
inhibition using antibodies or small molecule drugs might be of use in the treatment of these 
cancers. 

Panel 4D Summary: Ag3357 Highest expression of the CG58473-01 gene is seen in 
pokeweed mitogen-activated purified peripheral blood B lymphocytes (CT=33.2). In addition, 
no expression of the transcript is seen in PBMC that contain normal B cells, but the transcript 
is induced when PBMC are treated with the B cell selective pokeweed mitogen. The transcript 
is not seen in the B cell lymphoma cell line Ramos regardless of stimulation. Thus, the 
putative protein encoded by this gene could potentially be used diagnostically to identify 
activated B cells. Therefore, therapeutics that antagonize the function of this gene product may 
be useful as therapeutic drugs to reduce or eliminate the symptoms in patients with 
autoimmune and inflammatory diseases in which B cells play a part in the intiation or 
progression of the disease process, such as lupus erythematosus, Crohn's disease, ulcerative 



515 



WO 02/072757 PCT/USO 2/06908 

colitis, multiple sclerosis, chronic obstructive pulmonary disease, asthma, emphysema, 
rheumatoid arthritis, or psoriasis. 

G. CG58470-01: UDP-N-ACETYLHEXOSAMINE PYROPHOSPHORYLASE 

Expression of gene CG5 8470-01 was assessed using the primer-probe set Ag5940, 
described in Table GA. 

Table GA , Probe Name Ag5940 



Primers | Sequences 


Length 


Start 
Position 


SEQ 
ID 
NO: 


Forward \s ' -atatcctgaagctacaacagttagct-3 ' 


26 


422 


387 


jTET-5 * -tggcaacaaatgcattattccatattacg-3 ' - 
PTObe jTAMRA 


29 


459 


388 


Reverse js ' -gagtgaactcgctggtcatg-3 ' 


20 


489 


389 



General screening panelvl.S Summary: Ag5940 - Expression of this gene is 
low/undetectable (CTs > 35) across all of the samples on this panel (data not shown). 



Panel 5 Islet Summary: Ag5940 - Expression of this gene is low/undetectable (CTs > 35) 
across all of the samples on this panel (data not shown). 

H. CG58593-01 : UBIQUITIN-52 



Expression of gene CG58593-01 was assessed using the primer-probe set Ag3421, 
described in Table HA. 

Table HA . Probe Name Ag342 1 



Primers 


Sequences 


A J Start SEQ ID 
Length | Position j NO: 


Forward 


5 ' -atctgctgcaagtgctatgc-3 ■ 


20 j 291 j 390 


Probe 


TET-5 ' -cggtgctatcaactgccacaagaaga-3 ' -TAMRA 


26 j 323 j 391 


Reverse 


5 1 -tgaccttcttcctggggtac-3 ' 


20 j 371 j 392 



CNS_neurodegeneration_vl.O Summary: Ag3421 - Expression of this gene is 
low/undetectable (CTs > 35) across all of the samples on this panel (data not shown). 



General^screening^anel^vl^ Summary: Ag3421 - Expression of this gene is 

low/undetectable (CTs > 35) across all of the samples on this panel (data not shown). 
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Panel 4D Summary: Ag3421 - Expression of this gene is low/undetectable (CTs > 35) across 
all of the samples on this panel (data not shown). 



I. CG57871-01 : TOUSLED-LIKE KINASE 

Expression of gene CG57871-01 was assessed using the primer-probe set Ag3351, 
described in Table IA. Results of the RTQ-PCR runs are shown in Tables IB and IC. 

Table IA. Probe Name Ag3351 



Primers 


Sequences 


Length 


Start 
Position 


1 SEQ ID 

] NO: 


Forward 


5 ' -gatcctcactgcaacattcttt-3 ' 


22 


346 


j 393 


Probe \ 


TET-5 ' -aatcccttaccgcgacgagtagaaca-3 ' - 
TAMRA 


26 


372 


| 394 


Reverse 


5 ' -gcactgccatctaaaccataga-3 ' 


22 


403 


1 395 


Table IB. CNS neurodegeneration vl.O 



Tissue Name 


Rel. Exp.(%) Ag3351, 
Run 210141594 


Tissue Name 


Rel. Exp.(%) Ag3351, 
Run 210141594 


AD 1 Hippo 


10.4 


Control (Path) 3 
Temporal Ctx 


3.0 


AD 2 Hippo 


33.4 


Control (Path) 4 
Temporal Ctx 


65.1 


AD 3 Hippo 


5.5 


AD 1 Occipital Ctx 


20.2 


AD 4 Hippo | 


8.4 


AD 2 Occipital Ctx 
(Missing) 


0.0 


AD 5 hippo 


100.0 


AD 3 Occipital Ctx 


3.8 


AD 6 Hippo 


33.4 


AD 4 Occipital Ctx 


45.1 


Control 2 Hippo 


29.9 


AD 5 Occipital Ctx 


15.2 


Control 4 Hippo 


6.7 


AD 6 Occipital Ctx 


46.7 


Control (Path) 3 
Hippo 


3.7 


Control 1 Occipital 
Ctx 


2.7 


AD 1 Temporal Ctx 


16.8 


Control 2 Occipital 
Ctx 


52.5 


AD 2 Temporal Ctx \ 


45.1 


Control 3 Occipital 
Ctx 


45.4 


AD 3 Temporal Ctx i 


6.9 


Control 4 Occipital 
Ctx 


6.3 


AD 4 Temporal Ctx j 


54.0 


Control (Path) 1 
Occipital Ctx 


79.0 


AD 5 Inf Temporal 
Ctx 


92.0 


Control (Path) 2 
Occipital Ctx 


34.4 
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AD 5 SupTemporal 
Ctx 


13.0 


Control (Path) 3 
Occipital Ctx 


0.8 


AD 6 Inf Temporal i 
Ctx 


48.6 


Control (Path) 4 
Occipital Ctx 


40.6 


AD 6 Sup Temporal ■ 
Ctx 


56.6 


Control 1 Parietal 
Ctx 


6.9 


Control 1 Temporal 
Ctx 


6.2 


Control 2 Parietal 
Ctx 


48.0 


Control 2 Temporal i 
Ctx 


29.3 


Control 3 Parietal 
Ctx 


26.1 


Control 3 Temporal ; 
Ctx 


32.8 


Control (Path) 1 
Parietal Ctx 


73.7 


Control 4 Temporal ; 
Ctx 


13.9 


Control (Path) 2 
Parietal Ctx 


57.4 


Control (ratn) 1 
Temporal Ctx 


79.6 


L/Oniroi {rainj .5 
Parietal Ctx 


3.4 


Control (Path) 2 
Temporal Ctx 


97.3 


Control (Path) 4 
Parietal Ctx 


78.5 



Table IC. Panel 4D 



Tissue Name 


Rel. Exp.(%) 
Ag3351, Run 
165222896 


Tissue Name 


Rel. Exp.(%) 
Ag3351,Run 
165222896 


Secondary Thl act 


16.5 


HUVEC IL-lbeta 


15.4 


Secondary Th2 act 


26.4 


HUVEC TFN gamma 


13.5 


Secondary Trl act 


23.3 


HUVEC TNF alpha + IFN 
gamma 


17.0 


Secondary Thl rest 


6.0 


HUVEC TNF alpha + IL4 


11.0 


Secondary Th2 rest 


10.7 


HUVEC IL-11 


5.4 


Secondary Trl rest 


2.1 


Lung Microvascular EC 
none 


12.4 


Primary Thl act 


19.2 


Lung Microvascular EC 
TNFalpha + IL-lbeta 


9.6 


Primary Th2 act 


17.6 


Microvascular Dermal EC 
none 


14.7 


Primary Trl act 


36.1 


Microsvasular Dermal EC 
TNFalpha + IL-lbeta 


14.8 


Primary Thl rest 


55.5 


Bronchial epithelium 
TNFalpha + ILlbeta 


14.1 


Primary Th2 rest 


43.8 


Small airway epithelium 
none 


7.7 


Primary Trl rest 


15.9 


Small airway epithelium 
TNFalpha + IL-lbeta 


50.3 


CD45RA CD4 


13.0 


Coronery artery SMC rest 


15.6 



518 



WO 02/072757 PCT/US02/06908 



lymphocyte act 








CD45RO CD4 
lymphocyte act 


21.0 


Coronery artery SMC 
ilNr alpha + IL-lbeta 


6.1 


CD8 lymphocyte act 


12.9 


Astrocytes rest 


11.5 


Secondary CD8 
lymphocyte rest 


14.9 


Astrocytes TNFalpha + 
IL-lbeta 


11.8 


Secondary CD8 
lymphocyte act 


14.8 


KU-8 12 (Basophil) rest 


19.2 


CD4 lymphocyte none 


10.7 


KU-8 12 (Basophil) 
PMA/ionomycin 


54.0 


2ry Thl/Th2/Trl_anti- 
CD95 CH11 


12.7 


CCD1 106 (Keratinocytes) 
none 


12.2 


LAK cells rest 


17.2 


CLD1 106 (Keratmocytes) 
TNFalpha + IL-lbeta 


9.0 


T A XT' TT 1 ' 

LAK cells IL-2 


11 A 

11 .4 


Liver cirrhosis 


*7 A 

7.4 


TAT/" 1 - TT 1 i TT 1 1 I 

LAK cells IL-z+IL- 1 1 \ 


20.4 


Lupus kidney 


1 A 

3.4 


LAK cells IL-2+IFN 
gamma 


37.9 


NCI-H292 none 


47.6 


T At/ _-.11-, tt 1_i_ TT 1 O 

LAK cells 1L-Z+ 1L- 1 o • 


1 o c 
18. 0 






TAT/" «»11r> 

LAK cells 
PMA/ionomvcin 


10.5 


NCI-H292 IL-9 


30.4 


NK Cells IL-2 rest 


17.8 


NCI-H292 IL-13 


15.7 


Two Way MLR 3 day 


1 1 1 

33.2 


XT/™*T T_T1A1 TT7XT «^ ^ 

NCI-H292 IFN gamma 


1C C 

25.5 


Two Way MLR 5 day 


i a a ' 

10.6 


HPAEC none 


13.5 


Two Way MLR 7 day 


9.9 


HPAEC TNF alpha + IL-1 
beta 


17.7 


PBMC rest 


too 

12.8 


Lung fibroblast none 


1 1.5 


PBMC PWM 


63.3 


Lung fibroblast TNF alpha 
+ IL-1 beta 


12.4 


PBMC PHA-L 


1 O A 

18.0 


Lung fibroblast IL-4 


111 

31.2 


Ramos (B cell) none 


14.0 


Lung fibroblast IL-9 


22.2 


Ramos (B cell) 
ionomycin 


77.9 


Lung fibroblast IL-13 


27.4 


B lymphocytes PWM 


100.0 


Lung fibroblast IFN 
gamma 


44.8 


B lymphocytes CD40L j 
and IL-4 


30.8 


Dermal fibroblast 
CCD1070rest 


33.7 


EOL-1 dbcAMP 


11.3 


Dermal fibroblast 
CCD1070 TNF alpha 


50.0 


EOL-1 dbcAMP 
PMA/ionomycin 


13.7 


Dermal fibroblast 
i^LXuu/u il-1 oeta 


13.4 


Dendritic cells none 


14.7 


Dermal fibroblast IFN 
gamma 


14.3 


Dendritic cells LPS 


19.8 


Dermal fibroblast IL-4 


25.7 



519 



WO 02/072757 



PCT/US02/06908 



L/encinuc ceiia aim- j 
CD40 | 


14.2 


IBD Colitis 2 


j 


2.0 


Monocytes rest ] 


22.5 


IBD Crohn's 


1 

..... ... .3 . ... 


3.2 


Monocytes LPS | 


32.8 


Colon 




26.8 


Macrophages rest 1 

r c i 


31.0 


Lung 




14.6 


Macrophages LPS j 


30.8 


Thymus 


1 


28.7 


HUVEC none j 


18:3 


Kidney 


1 


45.4 


HUVEC starved j 


45.7 


l 



CNS_neurodegeneration_yl.O Summary: Ag3351 - This panel confirms the expression of 
this gene at low levels in the brain in an independent group of individuals. While no 
differential expression of this gene is detected between Alzheimer's diseased postmortem 
brains and those of non-demented controls, the widespread expression of this gene in the brain 
suggests that therapeutic modulation of the expression or function of this gene may be 
effective in the treatment of neurologic disorders such as Parkinson's disease, epilepsy, stroke 
and multiple sclerosis. 

General screening panel vl.4 Summary: Ag3351 - Results from one experiment are not 
included. The amp plot indicates that there were experimental difficulties with this run. 

Panel 4D Summary: Ag3351 The CG57871-01 gene is expressed at high to moderate levels 
in a wide range of cell types of significance in the immune response in health and disease. 
These cells include members of the T-cell, B-cell, endothelial cell, macrophage/monocyte, and 
peripheral blood mononuclear cell family, as well as epithelial and fibroblast cell types from 
lung and skin, and normal tissues represented by colon, lung, thymus and kidney. This 
ubiquitous pattern of expression suggests that this gene product may be involved in 
homeostatic processes for these and other cell types and tissues. This pattern also suggests a 
role for the gene product in cell survival and proliferation. Therefore, modulation of the gene 
product with a functional therapeutic may lead to the alteration of functions associated with 
these cell types and lead to improvement of the symptoms of patients suffering from 
autoimmune and inflammatory diseases such as asthma, allergies, inflammatory bowel 
disease, lupus erythematosus, psoriasis, rheumatoid arthritis, and osteoarthritis. 

J. CG58590-01 and CGS8590-02: PALS Guanylate kinase 

Expression of gene CG58590-01 and CG58590-02 was assessed using the primer- 
probe set Ag3380, described in Table JA. Results of the RTQ-PCR runs are shown in Tables 
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JB, JC and JD. Please note that CG58590-02 represents a full-length physical clone of the 
CG58590-01 gene, validating the prediction of the gene sequence. 



Table JA . Probe Name Ag3380 



Primers 


Sequences 


Len thl Start 
8 j Position 


SEQ ID 
NO: 


Forward 


5 ' -tttgatacggcaattgtgaatt-3 * 


22 | 1931 


396 


Probe 


TET-5 ' -ccgatcttgataaagcctatcaggaa-3 1 - 
TAMRA 


26 | 1953 


397 


Reverse 


5 ' -cccactgaggttcagtatcaag-3 ' 


. 22 j 2000 


398 



Table JB . CNS_neurodegeneration_vLO 



Tissue Name 


Rel. Exp.(%) Ag3380, 
Run 210153753 


Tissue Name 


Rel. Exp.(%) Ag3380, 
Run 210153753 


AD 1 Hippo 


12.9 


Control (Path) 3 
Temporal Ctx 


4.7 


AD 2 Hippo 


27.7 


Control (Path) 4 
Temporal Ctx 


24.3 


AD 3 Hinno 


4.8 


AD 1 Occipital Ctx 


15.6 


AD 4 Hippo 


7.7 


AD 2 Occipital Ctx j 
fMissins^ 


0.0 


AD 5 hippo 


100.0 


AD 3 Occipital Ctx 


7.5 


AD 6 Hippo 


64.2 


AD 4 Occipital Ctx 


1 O 1 


L-onirOi L riippo 


Zj.J 






Control 4 Hippo 


9.9 


AD 6 Occipital Ctx 


40.1 


Control (Path) 3 ; 
Hippo 


8.4 


Control 1 Occipital 
Ctx 


4.2 


AD 1 Temporal Ctx ■ 


17.6 


Control 2 Occipital 
Ctx 


65.5 


AD 2 Temporal Ctx \ 


25.3 


Control 3 Occipital 
Ctx 


13.4 


AD 3 Temporal Ctx 


4.9 


Control 4 Occipital 
Ctx 


6.4 


AD 4 Temporal Ctx 


17.4 


Control (Path) 1 
Occipital Ctx 


78.5 


AD 5 Inf Temporal ; 
Ctx 


81.8 


Control (Path) 2 
Occipital Ctx 


9.4 


AD 5 SupTemporal 
Ctx 


42.9 


Control (Path) 3 
Occipital Ctx 


3.2 


AD 6 Inf Temporal 
Ctx 


48.6 


Control (Path) 4 
Occipital Ctx 


9.9 


AD 6 Sup Temporal 


53.6 


Control 1 Parietal 


6.0 
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Ctx | 


Ctx 




Control 1 Temporal j 
Ctx ] 


5.7 


Control 2 Parietal 
Ctx 


37.1 


Control 2 Temporal j 
Ctx ] 


34.6 


Control 3 Parietal 
Ctx 


16.5 


Control 3 Temporal 1 
Ctx j 


10.2 


Control (Path) 1 
Parietal Ctx 


67.4 


Control 4 Temporal j 
Ctx | 


7.1 


Control (Path) 2 
Parietal Ctx 


18.7 


Control (Path) 1 
Temporal Ctx j 


41.5 


Control (Path) 3 
Parietal Ctx 


3.3 


Control (Path) 2 
Temporal Ctx j 


29.5 


Control (Path) 4 
Parietal Ctx 


34.4 



Table JC . General_screening_panel_vL4 



Tissue Name 


Kel. H,xp.(vo) Ag33oU, 

Run 217043276 


Tissue Name 


Kel. hjXp.( /o) AgJjOU, 

Run 217043276 


Adipose 




Kenai ca. 1 is.- 1 u 


I D.J 


Melanoma* 
HSOoo(A).l 


18.9 


Bladder 


15.9 


Melanoma* 
Hs688(B).T 


16.8 


Gastric ca. (liver met.) 
NCI-N87 


52.5 


Melanoma* M14 


14.9 


Gastric ca. KATO III 


34.6 


Melanoma* j 

T OVTA/TA/'T 
\^\JAAXsY V 1 


21.6 


Colon ca. SW-948 


4.9 


Melanoma* SK- 
MEL-5 


27.0 


Colon ca.SW480 


82.4 


Squamous cell 
carcinoma SCC-4 


28.7 


Colon ca* (SW480 
met) SW620 


20.6 


Testis Pool 


5.1 


Colon ca. HT29 


9.2 


Prostate ca.* (bone 
met) PC-3 


59.9 


Colon ca. HCT-116 


20.6 


Prostate Pool 


8.6 


Colon ca. CaCo-2 


22.8 


Placenta 


3.9 \ 


Colon cancer tissue 


10.1 


Uterus Pool 


1.9 


Colon ca. SW1116 


6.2 


Ovarian ca. 
OVCAR-3 


32.5 


Colon ca. Colo-205 


4.9 


Ovarian ca. SK-OV- 

3 ! 


57.4 


Colon ca. SW-48 


4.2 


Ovarian ca. 
OVCAR-4 


14.7 


Colon Pool 


11.4 


Ovarian ca. 
OVCAR-5 


59.5 


Small Intestine Pool 


9.8 


Ovarian ca. IGROV- 


13.1 


Stomach Pool 


7.4 
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1 
1 








Ovarian ca. 
OVCAR-8 \ 


19.2 


Bone Marrow Pool 


4.2 


Ovary 


< o 


reiai riean 


O.J 


Breast ca. MLr - / ; 




wean rooi 




Breast ca. ML) A- 
MB-211 


58.2 


Lymph Node Pool 


11.4 


Breast ca BT 549 


26.8 


Fetal Skeletal Muscle 


3.3 


Breast ca. T47D 


100.0 


Skeletal Muscle Pool 


8.1 


Breast ca. 1VLUA-N 


<$./ 


Spleen Pool 


D.O 


Breast Pool 


10.4 


Thymus Pool 


6.3 


Trachea 


5.5 


CNS cancer (glio/astro) 
U87-MG 


39.2 


Lung 


3.8 


CNS cancer (glio/astro) 
U-118-MG 


54.7 


Fetal Lung 


11.8 


CNS cancer 
(neuro;met) SK-N-AS 


19.6 


Lungca.NCI-N417 


3.2 


CNS cancer (astro) SF- 
539 


12.2 


Lungca. LX-1 


20.7 


CNS cancer (astro) 
SNB-75 


29.7 


Liingca.NCI-H146 


3.8 


CNS cancer (glio) 
aiNB-iy 


13.4 


Lung ca. SHP-77 


17.9 


UJNb cancer (giioj or- 
295 


28.9 


Lung ca. A549 


30.6 


Brain (Amygdala) Pool 


11.8 


Lung ca. jnui-xIdzo 


J.O 


Brain (cereoeuumj 


o.u 


Lung ca. JNCi-rlzj 




Brain (tetaij 




Lungca. NCI-H460 


14.8 


Brain (Hippocampus) 

rOOl 


14.5 


Lung ca. riU-r-oz 




uereorai cortex rooi 


lu.Z 


Lung ca. NCI-H522 


28.7 


Brain (Substantia nigra) 
Pool 


16.0 


Liver 


0.4 


Brain fThalamus^ Pool 


22.7 


Fetal Liver 


11.9 


Brain (whole) 


5.9 


T.iver ca Henfi2 


12.9 


Sninal Cord Pool 


16.0 


Kidnev Pool 


18.4 


Adrenal Gland 


5.1 


Fetal Kidnev 


22.8 


Pituitarv eland Pool 


3.8 


Renal ca. 786-0 


28.5 ; 


Salivary Gland 


2.1 


Renal ca. A498 


5.0 


Thyroid (female) 


8.2 


Renal ca. ACHN 


22.4 


Pancreatic ca. CAPAN2 


51.4 


Renal ca. UO-31 


36.9 


Pancreas Pool 


12.3 



Table JD. Panel 4D 
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i issue iiame 


Rel. Exp.(%) 

AoHftO Dun 

/vg^jou, Jtvun 
165296532 


J.133UC ll«IIlC 


Rel. Exp.(%) 

Ao'i-ion Run 
165296532 


Secondary Thl act 


[J A 


tn rv/pp tt i K^to 

nU V EAs liv- 1 Dcla 


i ^ n 

1 J.U 


oeconaary inz aci 


1 A A 


nU VJCiV^ it in ganuna 


i o (k 


Secondary Trl act 


15.2 


T-TTrV/THp TMTT alnho -4- TPKT 

xtuvni^ iiNr aipna • iriN 
gamma 


28.3 


Secondary Thl rest 


A & 
4.0 


ttt r\/T7r" TXTT7 olrkVto -4- TT A 
LlAJyCAs lINr aipna -r 1L4 ; 


0£ 1 
ZO. i 


Secondary Th2 rest 


4.7 


HUVEC IL-11 


7.8 


Secondary Trl rest 


8.0 


Lung Microvascular EC 
none 


25.5 


Primary Thl act 


14.9 


Lung Microvascular EC 
TNFalpha + IL-lbeta 


19.5 


Primary Th2 act 


13.2 


Microvascular Dermal EC 
none 


37.9 


Primary Trl act 


20.7 


Microsvasular Dermal EC 
TNFalpha + IL-lbeta 


24.8 


Primary Thl rest 


35.6 


Bronchial epithelium 
TNFalpha + ILlbeta 


37.1 


Primary Th2 rest 


24.0 


Small airway epithelium 
none 


15.0 


Primary Trl rest 


16.2 


Small airway epithelium 
TNFalpha + IL-lbeta 


100.0 


CD45RA CD4 
lympnocyie aci 


23.3 


Coronery artery SMC rest 


30.1 


CD45ROCD4 

lymphocyte act 


18.2 


Coronery artery SMC 

TNTColiVko -U TT 1 Koto 

i iNr aipna > 11^- 1 oeia 


13.6 


CD8 lymphocyte act 


7.4 


Astrocytes rest 


22.5 


Secondary CD8 
lymphocyte rest 


13.4 


Astrocytes TNFalpha + 
IL-lbeta 


21.2 


Secondary CD8 
lymphocyte act 


4.4 


KU-8 12 (Basophil) rest 


17.9 


CD4 lymphocyte none ! 


8.0 


KU-812 (Basophil) 
PMA/ionomycin 


68.3 


2ryThl/Th2/Trl_anti- | 

Pf\QC pui t 


10.7 


CCD1 106 (Keratinocytes) 
none 


22.1 


LAK cells rest 


13.5 


UUU 1 1 Uo (Keratinocytes ) 
TNFalpha + IL-lbeta 


9.2 


T A V Anile TT O 

i>AJv cells LL-Z 


1 0 o 


Liver cirrhosis 


1 1 
J.I 


T A7 /*<>11e TT O-l-TT 1 O 
l^/VK. CeilS LL-Z+lLr 1 Z 


1 J.Z 


Lupus kidney 




LAK cells IL-2+IFN 

gCUIUIla 


15.6 


NCI-H292 none 


48.6 


LAK cells IL-2+ IL-18 


17.0 


NCI-H292 IL-4 


66.9 


LAK cells 
PMA/ionomycin 


9.5 


NCI-H292 IL-9 


59.5 
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NK Cells IL-2 rest 


7.0 


NCI-H292 EL-13 


36.6 


i wo way ivii_jv j uay 


1 S 9 


"MPT H909 TFM crnmma 
lNV_yl 1-T IN gallium 




i wo w ay ivii^tv j aay 


7 ft 


HP AFP nnnp 
rLr/vci^ nunc 




Two Way MLR 7 day 


9.6 


HPAECTNF alpha + IL-1 

oeia 


25.9 

f 


riJMi^ rest 




Lung riDrODiasi none 




PBMCPWM 


. 60.7 


Lung norooiasi iint aipna 
+ IL-lbeta 


11.0 




1 C Q 
15. 0 


mng norooiasi ij^-h- 


OS 0 


Ramos (B cell) none 


31.9 


Lung fibroblast IL-9 


20.6 


Ramos (B cell) 
ionomycin 


94.0 


Lung fibroblast IL-13 


18.8 


B lymphocytes PWM 


42.9 


Lung fibroblast IFN j 
gamma 


23.3 


B lymphocytes CD40L 
andIL-4 


24.7 


Dermal fibroblast 
CCD 1070 rest 


59.5 


EOL-1 dbcAMP 


12.9 


Dermal fibroblast 
CCD 1070 TNF alpha 


64.2 


EOL-1 dbcAMP 
r jyl/v ionomycin 


10.4 


Dermal fibroblast 

PPnift7n TT 1 Hf»ta 


32.8 


Dendritic cells none 


19.6 


Dermal fibroblast IFN 
gamma 


10.7 


jjendntic ceils .Lro 


1U. / 


uermai riDroDiasi 


Z1.0 


Dendritic cells anti- 
CD40 


18.8 


IBD Colitis 2 


2.0 


Monocytes rest 


15.0 


IBD Crohn's 


3.6 


Monocytes LPS 


13,8 


Colon 


36.9 


Macrophages rest 


25.3 


Lung 


19.3 


Macrophages LPS 


8.1 


Thymus 


72.2 


HUVEC none 


19.9 


Kidney 


24.5 


HUVEC starved 


35.8 







CNS_neurodegeneration_vl.O Summary: Ag3380 This panel does not show differential 
expression of the CG58590-01 gene in Alzheimer's disease. However, this expression profile 
confirms the presence of this gene in the brain. Please see Panel 1 .3D for discussion of utility 
of this gene in the central nervous system. 



General screening panel vl.4 Summary: Ag3380 - This gene is expressed at low to 
moderate levels in all samples on this pattern. The highest level of expression is seen in breast 
cancer cell line T47D (CT=27.8). Based on expression in this panel, this gene may be involved 
in brain, colon, renal, lung, ovarian and prostate cancer as well as melanomas. Thus, 
expression of this gene could be used as a diagnostic marker for the presence of these cancers. 
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Furthermore, therapeutic inhibition using antibodies or small molecule drugs might be of use 
in the treatment of these cancers. 

This gene product is also expressed in adipose, pancreas, adrenal, thyroid, pituitary, 
skeletal muscle, heart, and fetal liver. This widespread expression in tissues with metabolic 
function suggests that this gene product may be important for the pathogenesis, diagnosis, 
and/or treatment of metabolic and endocrine diseases, including obesity and Types 1 and 2 
diabetes. Furthermore, this gene is more highly expressed in fetal (CT=30.9) liver when 
compared to expression in the adult (CT>35) and may be useful for the differentiation of the 
fetal and adult sources of this tissue. 

In addition, this gene is expressed at moderate levels in the all regions of the CNS 
examined. Therefore, this gene may play a role in central nervous system disorders such as 
Alzheimer's disease, Parkinson's disease, epilepsy, multiple sclerosis, schizophrenia and 
depression. 

Panel 4D Summary: Ag3380 - This gene is expressed from moderate to low levels across all 
of the samples on this panel. The highest expression is seen in small airway epithelium treated 
with TNFalpha and IL-lbeta (CT=28.7). Interestingly, expression is much lower in untreated 
small airway epithelium (CT=31.5). There is also a significant difference between 
mononuclear cells treated with PWM (CT=29.5) and untreated cells (CT=32.7). Therefore, 
expression of this gene can be used to differentiate treated and untreated samples. 

Expression of this gene is detected at a moderate level (CT=30.2) in normal colon 
(similar levels for colon are seen on panel 1.4 (CT=30.9), but is significantly lower in the IBD 
Colitis 2 (CT=34.4) and IBD Crohn's (CT=33.5) samples. Therefore, therapies designed with 
the protein encoded for by this gene may potentially modulate colon function and play a role 
in the identification and treatment of inflammatory or autoimmune diseases which effect the 
colon including Crohn's disease and ulcerative colitis. 

K. CG58572-01 and CG58572-02: GLUCOSAMINE-PHOSPHATE N- 
ACETYLTRANSFERASE 

Expression of gene CG58572-01 and full length clone CG58572-02 was assessed using 
the primer-probe set Ag3375, described in Table KA. Results of the RTQ-PCR runs are shown 
in Tables KB, KC and KD. 
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Primers 


Sequences 


Length 


Start 
Position 


jSEQID 
j NO: 


Forward 


5 ' -aagaagtggactggagtcagaa-3 ' 


22 


58 


i 399 


Probe 


TET-5 ' -tacattttctccagccatttccccaa-3 ' - 
TAMRA 


26 


86 


j 400 


Reverse 


5 ' -agcagtacaaagaggcctcaa-3 ' 


2i ; 


135 


j 401 



Table KB . CNS_neurodegeneration_vl.O 



Tissue Name 


Rel. Exp.(%) Ag3375, 
Run 210154239 


Tissue Name 


Rel. Exp.(%) Ag3375, 
Run 210154239 


AD 1 Hippo 


17.1 


Control (Path) 3 
Temporal Ctx 


4.8 


AD 2 Hippo 


19.3 


Control (Path) 4 
Temporal Ctx 


27.5 


AD 3 Hippo 


7.4 


AD 1 Occipital Ctx 


11.5 


AD 4 Hippo 


4.5 


AD 2 Occipital Ctx 
(Missing) 


0.0 


AD 5 Hippo 


72.2 


AD 3 Occipital Ctx 


5.9 


AD 6 Hippo 


53.6 


AD 4 Occipital Ctx 


12.7 


Control 2 Hippo 


20.3 


AD 5 Occipital Ctx 


26.6 


Control 4 Hippo 


6.8 


AD 6 Occipital Ctx 


19.8 


Control (Path) 3 
Hippo 


5.5 


Control 1 Occipital 
Ctx 


3.2 


AD l Temporal Ctx 


11.6 


Control 2 Occipital 
Ctx 


36.1 


AD 2 Temporal Ctx 


23.8 


Control 3 Occipital 
Ctx 


7.4 


AD 3 Temporal Ctx 


5.5 


Control 4 Occipital 
Ctx 


4.1 


AD 4 Temporal Ctx 


16.5 


Control (Path) 1 
Occipital Ctx 


66.0 


AD 5 Inf Temporal 
Ctx 


100.0 


Control (Path) 2 
Occipital Ctx 


8.2 


AD 5 Sup Temporal 
Ctx 


55.9 


Control (Path) 3 
Occipital Ctx 


1.9 


AD 6 Inf Temporal 
Ctx 


37.9 


Control (Path) 4 
Occipital Ctx 


12.2 


AD 6 Sup Temporal 
Ctx 


59.5 


Control 1 Parietal 
Ctx 


2.4 


Control l Temporal 
Ctx 


3.5 


Control 2 Parietal 
Ctx 


31.6 


Control 2 Temporal 


25.3 


Control 3 Parietal 


11.7 
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Ctx 




Ctx 




Control 3 Temporal 
Ctx 


8.2 


Control (Path) 1 
Parietal Ctx 


49.7 


Control 3 Temporal 
Ctx 


4.0 


Control (Path) 2 
Parietal Ctx 


15.4 


Control (ratnj i 
Temporal Ctx 


52.9 


i^oncroi {rain; j 
Parietal Ctx 


4.2 


Control (Path) 2 
Temporal Ctx 


26.6 


Control (Path) 4 
Parietal Ctx 


32.5 



Table KC. Panel 1.3D 



Tissue Name 


Rel. Exp.(%) Ag3375, 
Run 165674233 


Tissue Name 


Rel. Exp.(%) Ag3375, 
Run 165674233 


Liver adenocarcinoma ; 


51.8 


Kidney (fetal) 


9.7 


Pancreas 


9.3 


Renal ca, 786-0 


19.6 


Pancreatic ca CAP AN 1 
2 


52.1 


Renal ca. A498 


26.2 


Adrenal eland 

X IVU VllUl UIUIU 


8.9 


Renal ca. RXF 393 


15.7 


Thyroid 


6.3 


Renal ca. ACHN 


8.2 


Salivary gland 


1 Q ^ 


1? final T \C\ "\\ 

jvenai ca. uuo i 


JJ.H 


Pituitary gland 


15.1 


Renal ca. TK- 10 


9.8 


Brain (fetal) 


15.5 


Liver 


20.4 


Brain (whole) 


34.6 


Liver (fetal) 


16.5 


Brain (amygdala) 


16.0 


Liver ca. 

(hepatoblast) HepG2 


49.0 


Brain (cerebellum) 


34.2 


Lung 


4.5 


Brain (hippocampus) 


12.1 


Lung (fetal) 


5.4 


Brain (substantia nigra) 


12.8 


Lung ca. (small cell) 
LX-1 


32.3 


Brain (thalamus) 


17.9 


Lung ca. (small cell) 
NCI-H69 


17.3 


Cerebral Cortex 


10.4 


Lung ca (s.cell var.) 
SHP-77 


30.1 


Spinal cord 


13.3 


Lung ca. (large 
cell)NCI-H460 


66.4 


glio/astro U87-MG 


14.8 


Lung ca. (non-sm. 
cell) A549 


19.1 


glio/astroU-118-MG 


95.3 


Lung ca. (non-s.cell) 
NCI-H23 


13.8 


Astrocytoma SW1783 


42.0 


Lung ca. (non-s.cell) 
HOP-62 


18.7 


neuro*;met SK-N-AS 


47.0 


Lung ca. (non-s.cl) 
NCI-H522 


19.5 
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Astrocytoma SF-539 


11.4 


Lung ca. (squam.) 


9.9 


Astrocytoma SNB-75 


15.6 


Lung ca. (squam.) 


19.6 


glioma SNB- 19 


11.8 


Mammary gland 


14.6 


glioma U251 


40.9 


Breast ca.* (pl.ef) 
MCF-7 


81.2 


glioma SF-295 


10.1 


Breast ca.* (pl.ef) 

V/THA \/TQ Til 


91.4 


Heart (fetal) 


1.3 


ureast ca. (pi.eij 
T47D 


35.4 


Heart 


4.7 


Breast ca.BT-549 


97.9 


Skeletal muscle (tetal ) 


1 O 

l.z 


Breast ca. jylua-jn 


1/10 

14.0 


Skeletal muscle 


38.7 


Ovary 


1.6 


Bone marrow 


4.6 


Ovarian ca. OVCAR-i 
3 


39.2 


Thymus 


2.7 


Ovarian ca. OVCAR- 
4 


23.0 


Spleen 


7.9 


Ovarian ca. OVCAR- 
5 


13.8 


Lymph node 


13.0 


Ovarian ca. OVCAR- 
8 


8.5 


Colorectal 


3.3 


Ovarian ca. IGROV- 

1 


5.6 


Stomach 


27.7 


Ovanan ca.* (ascites) 
SK-OV-3 


44.8 


omaii lniesune 




uxerus 




Colon ca. SW480 


16.5 


Placenta 


2.6 


Colon ca.* 

o vvo/U(oW4oU met) 


29.1 


Prostate 


15.6 


Colon ca. HT29 


13.8 


Prostate ca.* (bone 
metjrC-J 


56.6 


Colon ca.HCT-1 16 


27.7 


Testis 


40.6 


Colon ca. CaCo-2 


17.4 


Melanoma 
ilSOoo(AJ. 1 


5.5 


Colon ca. 
tissue(UL>U3 o oo ) 


26.4 


Melanoma* (met) 
Jtlsoooioj. 1 


8.9 


Colon ca. HCC-2998 


32.1 


Melanoma UACC-62j 


17.8 


Gastric ca.* (liver met) 
NCI-N87 


100.0 


Melanoma Ml 4 


27.7 


Bladder 


28.7 


Melanoma LOX 

TMVT 

UVl V 1 


6.6 


Trachea 


9.4 


Melanoma* (met) 
SK-MEL-5 


13.0 


Kidney 


9.0 


Adipose 


8.0 
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Tissue Name 


Rel. Exp.(%) 
Ag3375, Run 
165Z96547 


Tissue Name 


Rel. Exp.(%) 
Ag337S, Run 
1052yo547 


Secondary Thl act 


1 A £. 
14.0 


ttt T\rC/~* TT 1 Koto 


OA < 


Secondary Th2 act 


13.0 


HUVEC IFN gamma 


24.5 


Secondary Trl act 


17.3 


HUVEC TNF alpha + IFN 
gamma 


24.0 


Secondary Thl rest 


0.9 


HUVEC TNF alpha + IL4 


23.2 


Secondary Th2 rest 


1.5 


HUVEC IL- 11 


12.1 


Secondary Trl rest 


2.9 


Lung Microvascular EC 
none 


21.3 


Primary Thl act 


16.0 


Lung Microvascular EC 
TNFalpha+IL-lbeta 


24.1 


Primary Th2 act 


12.1 


Microvascular Dermal EC 
none 


27.4 


Primary Trl act 


25.0 


Microsvasular Dermal EC 
TNFalpha + IL-lbeta 


24.0 


Primary Thl rest 


10.4 


Bronchial epithelium 
TNFalpha + ILlbeta j 


20.3 


Primary Th2 rest j 


6.1 


Small airway epithelium 
none 


11.3 


Primarv Trl rest 


9.0 


Small airway epithelium 
TNFalpha+IL-lbeta 


54.0 


CD45RA CD4 
lymphocyte act 


14.6 


Coronery artery SMC rest 


23.5 


CD45RO CD4 
lymphocyte act 


13.6 


Coronery artery SMC j 
TNFalpha + IL-lbeta 


' 12.0 


CD8 lymphocyte act 


14.2 


Astrocytes rest 


5.3 


Secondary CD8 
lymphocyte rest 


14.4 


Astrocytes TNFalpha + 
IL-lbeta 


5.4 


Secondary CD8 
lymphocyte act 


5.8 


KU-812 (Basophil) rest 


19.5 


CD4 lvmnhocvte none 

V^> JU^ I ▼ 111L/11V V T IV AAV/AAV/ 


2.4 


KU-812 (Basophil) 
PMA/ionomycin 


56.3 


2ry Thl/Th2/Trl anti- 
CD95CH11 


2.6 


CCD1 106 (Keratinocytes) 
none 


26.6 


LAK cells rest 


5.1 


CCD1 106 (Keratinocytes) 


7.8 


LAK cells IL-2 


10.7 


Liver cirrhosis 


2.6 


LAK cells IL-2+IL-12 


12.5 


Lupus kidney 


0.8 


LAK cells IL-2+1FN 
gamma 


20.2 


NCI-H292 none 


28.7 


LAK cells IL-2+ IL-18 


16.6 


NCI-H292 IL-4 


54.7 
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PMA/ionomycin 


12.5 


NCI-H292 IL-9 


45.7 


NK Cells IL-2 rest 


7.1 


NCI-H292 IL-13 


24.3 


1 WO W dy IVlivXS. j Kldy 


u.o 




W 2 


i wo w ay ivii^iv j uay 




HP AFP 1 nnnp 

rxr ACv/ nunc 


1 7 R 


Two Way MLR 7 day 


6.0 


HPAECTNF alpha +IL-1 

DCld 


30.1 


rojvii^ rest 


U.o 


i^ung iiurooiaoi nunc 


1 0 2 


PBMC PWM 


42.3 


i^ung iioroDiddi n>r dipiid 
+ IL-1 beta 


6.3 


PUMP PXJ A T 


1 1 .o 


i^ung iiuruuiaoi xlj *r 


27 2 


Ramos (B cell) none 


30.6 


Lung fibroblast IL-9 


26.8 


Ramos (B cell) 
ionomycin 


100.0 


Lung fibroblast IL- 1 3 


21.8 


B lymphocytes PWM 


77.4 


Lung fibroblast IFN 
gamma 


29.5 


B lymphocytes CD40L 
and IL-4 


12.2 


Dermal fibroblast 
CCD1 070 rest 


42.3 


EOL-1 dbcAMP 


13.0 


Dermal fibroblast 
CCD1070TNF alpha 


51.4 


EOL-1 dbcAMP 
riviiv ionomycin 


6.9 


tn i r»i ii * 

Dermal fibroblast 

PPr>107nTT 1 V*ptn 
JLL-1 DCld 


22.5 


Dendritic cells none 


4.5 


Dermal fibroblast IFN 
gamma 


11.1 


uenaniic cens Lro 


J.O 


ueimai iiuroDidsi IL*-** 


10 s 

17.J 


uenanuc cens ami- 
CD40 


2.9 


L3D Colitis 2 


0.7 


Monocytes rest 


2.2 


EBD Crohn's 


0.9 


Monocytes LPS 


1.3 


Colon 


7.6 


Macrophages rest 


6.6 


Lung 


6.2 


Macrophages LPS 


2.7 


Thymus 


9.4 


HUVEC none 


17.4 


Kidney 


4.2 


HUVEC starved 


37.4 







CNS_neurodegeneration_vl.O Summary: Ag3375 This panel does not show differential 
expression of the CG58572-01 gene in Alzheimer's disease. However, this expression profile 
confirms the presence of this gene in the brain. Please see Panel 1.3D for discussion of utility 
of this gene in the central nervous system. 



Panel 1.3D Summary: Ag3375 - This gene is expressed at moderate to low levels in all 
samples on this panel, with the highest expression in gastric cancer cell line NCI-N87 
(CT=28.8). Based on expression in this panel, this gene may be involved in gastric, pancreatic, 
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brain, colon, renal, lung, breast, ovarian and prostate cancer as well as melanomas. Thus, 
expression of this gene could be used as a diagnostic marker for the presence of these cancers. 
Furthermore, therapeutic modulation of the expression or function of this gene might be of use 
in the treatment of these cancers. 

This gene product is also expressed in adipose, pancreas, adrenal, thyroid, pituitary, 
skeletal muscle, heart, and liver. This widespread expression in tissues with metabolic function 
suggests that this gene product may be important for the pathogenesis, diagnosis, and/or 
treatment of metabolic and endocrine diseases, including obesity and Types 1 and 2 diabetes. 

In addition, this gene is expressed at moderate levels in the CNS. Therefore, this gene 
may play a role in central nervous system disorders such as Alzheimer's disease, Parkinson's 
disease, epilepsy, multiple sclerosis, schizophrenia and depression. 

Panel 4D Summary: Ag3375 The CG58572-01 gene is ubiquitously expressed on this panel, 
with highest expression in the B cell line Ramos treated with ionomycin (CT=26.2). 
Significant levels of expression are also seen in pokeweed mitogen-activated B lymphocytes. 
Therefore, therapies that antagonize the function of this gene product may be useful as 
therapeutic drugs to reduce or eliminate the symptoms in patients with autoimmune and 
inflammatory diseases in which B cells play a part in the initiation or progression of the 
disease process, such as lupus erythematosus, Crohn's disease, ulcerative colitis, multiple 
sclerosis, chronic obstructive pulmonary disease, asthma, emphysema, rheumatoid arthritis, or 
psoriasis. 

Interestingly, there is a difference between the levels of expression in resting and 
activated secondary T cells. The level in activated secondary T cells (CT=28.7-29.2) appears 
to be higher than in resting T cells (CT=3 1 .3-33.1 ). Therefore, therapeutics designed with the 
protein encoded by this transcript could be important in the regulation of T cell function. 

L. CG58564-01 and CG58564-02: PROTEIN TYROSINE PHOSPHATASE - 

Expression of gene CG58564-01 and full length clone CG58564-02 was assessed using 
the primer-probe sets Ag3023 and Ag3373, described in Tables LA and LB. Results of the 
RTQ-PCR runs are shown in Tables LC, LD, LE and LF. 

Table LA . Probe Name Ag3023 
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Primers 


Sequences 


Length 


Start 
Position 


SEQID 
NO: 


Forward 


5 1 -ctaatgctggatttgtccatca-3 ' 


22 


492 


402 


Probe 


TET-5 1 -tcaggaatatgaagccatctacctagca- 
3 ' -TAMRA 


28 


517 1 


403 


Reverse 


5 ' -tggagtggtgacatcatctgta-3 ' 


22 


555 


404 


Table LB. Probe Name Ag3373 


Primers 


Sequences 


Length 


Start 
Position 


SEQID 
NO: 


Forwardp 1 -atttgtccatcaacttcaggaa-3 1 


22 


502 \ 


405 


Probe | 


TET-5 * -tgaagccatctacctagcaaaattaaca- 
3 • -TAMRA 


28 


526 


406 


Reverse 


5 ■ -tggagtggtgacatcatctgta-3 ' 


22 




407 



Table LC . CNS_neurodegeneration_vl.O 



Tissue Name 


Rel. Exp.(%) 
Ag3023, Run 
209821074 


Rel. Exp.(%) 
Ag3373, Run j 
210154071 : 


i issue 
Name 


Rel. Exp.(%) 
Ag3023, Run 
209821074 


Rel. Exp.(%) 
Ag3373, Run 
210154071 


AD 1 Hippo 

rr 


10.9 


16.8 


Control 
(Path) 3 
Temporal 
Ctx 


9.1 


8.0 


AD 2 Hippo 


34.2 


37.6 


Control 
(Path) 4 
Temporal 
Ctx 


40.6 


65.5 


AD 3 Hippo 


12.0 


15.8 


AD 1 

Occipital 

Ctx 


24.7 


29.1 


AD 4 Hippo 


13.8 


10.3 | 


AD 2 

Occipital 

Ctx 

(Missing) 


0.0 


0.0 


AD 5 hippo 


60.7 


57.8 I 


AD 3 

Occipital 

Ctx 


14.7 


15.0 


AD 6 Hippo 


80.7 


72.2 


AD4 

Occipital 

Ctx 


35.4 


22.4 


Control 2 
Hippo 


35.8 


38.4 


AD 5 

Occipital 

Ctx 


3.9 


30.4 


Control 4 


16.5 


11.7 


AD 6 


46.0 


37.4 
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Hippo 






Occipital 
Ctx 






Control (Path) 
3 Hippo 


13.1 


15.4 ' 


Control 1 
Occipital 
Ctx 


9.9 


10.7 


AD 1 Temporal 
Ctx 


39.0' 


31.4 


Control 2 
Occipital 
Ctx 


39.0 


38.4 


AD 2 Temporal 
Ctx 


38.7 


73.2 j 


Control 3 
Occipital 
Ctx 


23.0 


20.6 


AD 3 Temporal 
Ctx 


9.5 


13.2 \ 


Control 4 
Occipital 
Ctx 


13.3 


13.3 


AD 4 Temporal 
Ctx 


27.9 . 


34.9 


Control 
(Path) 1 
Occipital 
Ctx 


80.1 


76.3 


AD5Inf 
Temporal Ctx j 


59.0 


100.0 


Control 
(Path) 2 
Occipital 

Ctx j 


17.3 


20.0 


AD 5 

SupTemporal 
Ctx 


33.2 


44.1 


Control 
(Path) 3 
Occipital 

Ctx j 


8.4 


8.7 


AD6Inf 
Temporal Ctx 


100.0 


73.2 


Control 
(Path) 4 
Occipital 
Ctx 


21.2 


20.6 


AD 6 Sup 
Temporal Ctx 


79.6 


80.1 


Control 1 
Parietal Ctx i 


12.1 


16.3 


Control 1 
Temporal Ctx 


10.2 


13.7 


Control 2 
Parietal Ctx \ 


48.0 


40.9 


Control 2 
Temporal Ctx i 


41.2 


31.9 


Control 3 
Parietal Ctx 


17.9 


16.3 


Control 3 
Temporal Ctx 


20.3 


20.0 \ 


Control 
(Path) 1 
Parietal Ctx 


74.7 


64.2 


Control 4 
Temporal Ctx 


9.7 


l 9.9 


Control 
(Path) 2 
Parietal Ctx 


28.9 


59.9 


Control (Path) 
1 Temporal Ctx 


59.9 


68.3 


Control 
(Path) 3 
Parietal Ctx 


10.2 




Control (Path) 
2 Temporal Ctx 


40.3 


41.2 


Control 
(Path) 4 


44.8 


43.8 



534 



WO 02/072757 



PCT/US02/06908 



Parietal Ctx 



Table LP , General_screening_panel_vl.4 



Tissue Name 


Rel. Exp.(%) Ag3373,! 
Run 217043119 


Tissue Name 


Rel. Exp.(%) Ag3373, 
Run 217043119 


Adipose 


12.0 | 


Renal ca. TK-10 


20.3 


Melanoma* 
Hs688(A).T j 


•in r 

JU.o 


piauuci 




Melanoma* 
Hs688(B).T 




Gastric ca. (liver met.) 
NCI-N87 




Melanoma* Ml 4 


15.0 


Gastric ca. KATO ID 


30.8 


Melanoma* 
LOXMVI j 






9 7 


Melanoma* SK- 
MEL-5 


21.5 


Colon ca.SW480 


35.1 


Squamous cell 
carcinoma SCC-4 


33.0 


Colon ca* (SW480 
met) SW620 


13.9 


Testis Pool 


19.8 


Colon ca. HT29 


8.5 


Prostate ca.* (bone j 
met) PC-3 


100.0 


Colon ca. HCT-116 


36.9 


Prostate Pool 


9.2 


Colon ca. CaCo-2 


42.9 


Placenta 


3.8 


Colon cancer tissue 


9.0 


Uterus Pool 


7.4 


Colon ca. SW1116 


5.8 


Ovarian ca. j 
OVCAR-3 


28.5 


Colon ca. Colo-205 


4.3 


Ovarian ca. SK-OV- 
3 


40.3 


Colon ca. SW-48 


4.2 


Ovarian ca. 
OVCAR-4 




\^oion X^OOl 


Oft 7 


Ovarian ca. 
OVCAR-5 


35.1 


Small Intestine Pool 


12.2 


Ovarian ca. IGROV- 
1 


10.9 


Stomach Pool 


9.9 


Ovarian ca. 

r\\ if^ ao o 


9.2 


Bone Marrow Pool 


11.6 


IlVOfV 

\Jval y 


9 7 


Fptal Hp art 

FCldl I ltd! I 


20 7 


Breast ca. MCF-7 


37.6 


Heart Pool 


10.6 


Breast ca. MDA- 
MB-231 


37.1 


Lymph Node Pool 


17.9 


Breast ca. BT 549 


62.4 


Fetal Skeletal Muscle 


12.3 


Breast ca. T47D 


61.1 


Skeletal Muscle Pool 


16.0 


Breast ca. MDA-N 


10.0 


Spleen Pool 


11.6 


Breast Pool 


17.3 


Thymus Pool 


12.2 
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Trachea 


12.0 


CNS cancer (glio/astro) 
U87-MG 


29.1 


Lung 


6.7 


CNS cancer (gho/astro) 
U-118-MG 


69.3 


Fetal Lung 


34.2 


CNS cancer 
(neuro;met) SK-N-AS 


34.9 


Lung ca. NCI-N417 


5.4 j 


/■"IX 1 f\ / a \ O T* 

CNS cancer (astro) SF- 
539 


19.1 


Lung ca. LX-1 


17.2 


CNS cancer (astro) 
SNB-75 


35.8 


Lungca. NCI-H146 


3.0 


CNS cancer (glio) 


11.3 


Lung ca. SHP-77 


18.6 


UiNo cancer ^guoj or- 
295 


26.4 


Lung ca. A549 


29.1 i 


Brain (Amygdala) Pool 


4.5 


Ming Ca. JNV^l-rljZO 


A A 


Drain ^cercuciiuiiij 


0.1 


Lung ca. JNv^i-ri/j 


-31 £ 1 
Jl. O 


13 rain \yzV<\\) 




Lungca.NCI-H460 


18.2 


Brain (Hippocampus) 

rOOx 


5.3 






\^ ere oral LuncA. rooi 




Lungca. NCI-H522 | 


31.6 


Did III ^oUQSlanila IUgTaJ 

Pool 


4.8 


Liver 


i.2 i 


Brain (Thalamus) Pool 


8.0 


Fetal Liver 


32.3 


Brain (whole) 


6.2 


Liver ca. HepG2 


14.6 


Spinal Cord Pool 


6.6 


Kidney Pool 


22.1 


Adrenal Gland 


8.1 


Fetal Kidney 


26.1 


Pituitary gland Pool 


3.0 


Renal ca. 786-0 


28.7 


Salivary Gland 


4.7 


Renal ca. A498 


11.3 


Thyroid (female) 


4.4 


Renal ca. ACHN 


12.2 


Pancreatic ca. CAPAN2 


17.3 


Renal ca.UO-31 


24.1 


Pancreas Pool 


17.1 



Table LE. Panel 1.3D 



Tissue Name 


jRel. Exp.(%) Ag3023, 
Run 167966931 


Tissue Name 


Rel. Exp.(%) Ag3023, 
Run 167966931 


Liver adenocarcinoma 


1 31.1 : 


Kidney (fetal) 


26.2 


Pancreas 


j 6.1 


Renal ca. 786-0 


34.2 


Pancreatic ca. CAP AN 
2 


17.7 


Renal ca. A498 


17.6 


Adrenal gland 


j 3.8 


Renal ca. RXF 393 


17.2 


Thyroid 


j 3.0 


Renal ca. ACHN 


13.5 


Salivary gland 


| 3.9 


Renal ca.UO-31 


0.0 
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